CN112417861B - Vehicle data cleaning method and device and storage medium - Google Patents

Vehicle data cleaning method and device and storage medium Download PDF

Info

Publication number
CN112417861B
CN112417861B CN202011231659.XA CN202011231659A CN112417861B CN 112417861 B CN112417861 B CN 112417861B CN 202011231659 A CN202011231659 A CN 202011231659A CN 112417861 B CN112417861 B CN 112417861B
Authority
CN
China
Prior art keywords
accessory
standard
original
data
vehicle
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011231659.XA
Other languages
Chinese (zh)
Other versions
CN112417861A (en
Inventor
周凯
金振东
徐嘉赟
张明磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Data Enlighten Beijing Co ltd
Original Assignee
Data Enlighten Beijing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Data Enlighten Beijing Co ltd filed Critical Data Enlighten Beijing Co ltd
Priority to CN202011231659.XA priority Critical patent/CN112417861B/en
Publication of CN112417861A publication Critical patent/CN112417861A/en
Application granted granted Critical
Publication of CN112417861B publication Critical patent/CN112417861B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a vehicle data cleaning method, a vehicle data cleaning device and a storage medium. The method comprises the following steps: standard vehicle data is acquired. Raw vehicle data is acquired. And when the original vehicle data comprises original vehicle type data, screening the standard vehicle types according to the original vehicle type data and the vehicle type atom library to obtain the specified standard vehicle types. And when the original vehicle data comprises original accessory data, screening the standard accessories according to the original accessory data and the accessory atom library to obtain the specified standard accessories. And when the original vehicle data comprises original accessory function attribute data, screening the standard accessory function attributes according to the original accessory function attribute data and the accessory function attribute atom library to obtain the appointed standard accessory function attributes. According to the invention, the standard vehicle data are screened according to the original vehicle data, so that the standardization of the original vehicle data can be realized, and the intelligent level of data cleaning is improved. The invention improves the speed and the accuracy of subsequent screening by carrying out word segmentation processing on the original vehicle data.

Description

Vehicle data cleaning method and device and storage medium
Technical Field
The invention relates to the field of vehicle data matching, in particular to a vehicle data cleaning method, a vehicle data cleaning device and a storage medium.
Background
In the automobile aftermarket, accessory data of mechanisms such as an accessory manufacturer, an accessory distributor and an accessory e-commerce platform generally relate to different types of accessory data such as multiple brands, multiple varieties, original factories, high imitations, packages and the like. Due to the fact that vehicle types change rapidly and intermediate links are multiple, the accessory data are miscellaneous, disordered, multiple and poor, and a unified data management standard is lacked. Further, the problems of difficult production management, difficult inventory management, blocked information, difficult after-sales service, difficult sales management and the like are caused.
FIG. 1 is a table of factory accessory inventory data in accordance with the prior art. As shown in fig. 1, existing accessory data is usually managed by using Excel or word as a carrier, and a vehicle model adapted to a product is usually filled in one cell. Fig. 2 is a conventional fitting matching table. As shown in fig. 2, the fitter typically converts the data into standard structured data through manual matching, which is labor-intensive and inefficient. The accessory dealer does not have unified standard data to do the basis when matching by oneself and marks, because to understanding deviation and the lack of data of motorcycle type data, the data accuracy after the matching is extremely low, and the later stage still need adjustment and matching many times, but only short-term use. Meanwhile, at present, an accessory manufacturer and a manufacturer do not have comprehensive and standard accessory original factory codes and functional attribute data corresponding to the vehicle type, and the limitation of self-matching data is large, which is also the main reason for leading the data to be more and more difficult to manage. Generally, manufacturers with the capability of data management need to equip a professional data product manager for each category, such as spark plugs, to perform daily data management, which is very demanding for users.
Therefore, how to improve the efficiency, accuracy and intelligence level of vehicle data cleaning and reduce the operation difficulty and maintenance cost becomes a key point of urgent research and technical problems to be solved by technical personnel in the field.
Disclosure of Invention
In view of this, embodiments of the present invention provide a vehicle data cleaning method, apparatus, and storage medium, so as to solve the problems of low efficiency, low accuracy, high operation difficulty, and high maintenance cost of the vehicle data cleaning method in the prior art.
Therefore, the embodiment of the invention provides the following technical scheme:
in a first aspect of the present invention, a vehicle data cleaning method is provided, including:
acquiring standard vehicle data;
the standard vehicle data comprises standard vehicle types, a vehicle type atom library, standard accessories, an accessory atom library, standard accessory functional attributes and an accessory functional attribute atom library;
acquiring original vehicle data;
wherein the raw vehicle data comprises at least one of: original vehicle type data, original accessory data and original accessory functional attribute data;
when the original vehicle data comprises original vehicle type data, performing word segmentation processing on the original vehicle type data to obtain vehicle type atom information, and screening the standard vehicle type according to the vehicle type atom information and the vehicle type atom library to obtain an appointed standard vehicle type;
when the original vehicle data comprises original accessory data, performing word segmentation processing on the original accessory data to obtain accessory atom information, and screening the standard accessories according to the accessory atom information and the accessory atom library to obtain specified standard accessories;
and when the original vehicle data comprises original accessory function attribute data, performing word segmentation processing on the original accessory function attribute data to obtain accessory function attribute atomic information, and screening the standard accessory function attributes according to the accessory function attribute atomic information and the accessory function attribute atomic library to obtain the appointed standard accessory function attributes.
Furthermore, each standard vehicle type has a corresponding relation with at least one standard accessory, and each standard accessory has a corresponding relation with at least one standard accessory functional attribute;
the standard vehicle data further comprises original factory accessory codes, and each standard accessory has a corresponding relation with one original factory accessory code;
when the original vehicle data comprises original vehicle type data, the method further comprises the steps of obtaining the standard accessories corresponding to the specified standard vehicle types, obtaining the standard accessory functional attributes corresponding to the specified standard vehicle types, and obtaining the original plant accessory codes corresponding to the specified standard vehicle types;
when the original vehicle data comprises original accessory data, the method further comprises the steps of obtaining the standard vehicle model corresponding to the specified standard accessory, obtaining the standard accessory functional attribute corresponding to the specified standard accessory, and obtaining the original factory accessory code corresponding to the specified standard accessory;
when the original vehicle data comprises original accessory function attribute data, the method further comprises the steps of obtaining the standard vehicle type corresponding to the specified accessory function attribute, obtaining the standard accessory corresponding to the specified accessory function attribute, and obtaining the original plant accessory code corresponding to the specified accessory function attribute.
Further, when the original vehicle data comprises original vehicle type data, acquiring specified original plant vehicle type data according to the original plant accessory codes corresponding to the specified standard vehicle types, and verifying the specified standard vehicle types according to the specified original plant vehicle type data;
when the original vehicle data comprises original accessory data, acquiring appointed original accessory data according to the original accessory code corresponding to the appointed standard accessory, and verifying the appointed standard accessory according to the appointed original accessory data;
and when the original vehicle data comprises original accessory function attribute data, acquiring appointed original accessory function attribute data according to the original accessory code corresponding to the appointed standard accessory function attribute, and verifying the appointed standard accessory function attribute according to the appointed original accessory function attribute data.
Further, when the specified standard accessories are multiple, the method also comprises the step of sequencing the multiple specified standard accessories;
the step of ordering a plurality of said specified standard accessories comprises:
setting the value of credit of each of the specified standard accessories to 0;
acquiring an accessory function attribute which has a corresponding relation with any one of the specified standard accessories and recording the accessory function attribute as a grading accessory function attribute;
performing the following steps for each of the scored accessory functional attributes:
respectively calculating the attribute score of each specified standard accessory relative to the current functional attribute of the scored accessory, and adding 1 to the score value of the specified standard accessory with the highest attribute score relative to the current functional attribute of the scored accessory;
and sorting the specified standard accessories according to the grading values from high to low.
Further, when the functional attribute of the scoring accessory is tendency, the calculation formula of the attribute score is as follows:
Figure BDA0002765418630000041
wherein na is a standard accessory, op is a score accessory function attribute, prn is an nth configuration code, and S(na, op, have)For non-deduplication source plant vehicle data totals including both standard and scored accessory functional attributes, S(na, op, prn, have)The total number of original factory vehicle data which simultaneously comprises standard accessories, grading accessory functional attributes and the nth configuration code;
when the scoring accessory functional attribute is non-tendency, the calculation formula of the accessory score is as follows:
Figure BDA0002765418630000042
wherein na is a standard accessory, op is a score accessory function attribute, prn is an nth configuration code, and S(na, op, none)Total number of non-deduplication source vehicle data, S, excluding standard and scored accessory functional attributes(naOp, prn, none)The total number of genuine vehicle data excluding the standard accessory, the scored accessory function attribute and the nth configuration code.
Further, when the score values of at least two specified standard accessories are the same, the core score accessory functional attributes are obtained from the score accessory functional attributes, and the specified standard accessories are ranked more forward in the higher score relative to the attribute of the core score accessory functional attributes.
Further, the standard vehicle data further comprises vehicle type configuration scores, and each standard vehicle type corresponds to one vehicle type configuration score;
the vehicle type configuration score calculating method comprises the following steps: acquiring original plant accessory function attributes corresponding to the standard vehicle type, screening the original plant accessory function attributes according to the standard accessory function attributes corresponding to the standard vehicle type to obtain matched original plant accessory function attributes, and calculating the ratio of the total number of the matched original plant accessory function attributes to the total number of the original plant accessory function attributes;
and when the original vehicle data comprises original vehicle type data, acquiring the vehicle type configuration score corresponding to the specified standard vehicle type.
Further, the standard vehicle model comprises at least one of: vehicle type name, industry and trust department bulletin number, distribution channel sale type, vehicle body form and country.
In a second aspect of the present invention, there is provided a vehicle data washing apparatus, the apparatus comprising:
first acquiring means for acquiring standard vehicle data;
the standard vehicle data comprises standard vehicle types, a vehicle type atom library, standard accessories, an accessory atom library, standard accessory functional attributes and an accessory functional attribute atom library;
second acquiring means for acquiring original vehicle data;
wherein the raw vehicle data comprises at least one of: original vehicle type data, original accessory data and original accessory functional attribute data;
the system comprises a first screening device, a second screening device and a third screening device, wherein when the original vehicle data comprise original vehicle type data, the first screening device is used for carrying out word segmentation processing on the original vehicle type data to obtain vehicle type atom information, and screening the standard vehicle type according to the vehicle type atom information and a vehicle type atom library to obtain an appointed standard vehicle type;
the second screening device is used for carrying out word segmentation processing on the original accessory data to obtain accessory atom information when the original vehicle data comprise original accessory data, and screening the standard accessories according to the accessory atom information and the accessory atom library to obtain specified standard accessories;
and when the original vehicle data comprises original accessory function attribute data, the third screening device is used for performing word segmentation processing on the original accessory function attribute data to obtain accessory function attribute atomic information, and screening the standard accessory function attributes according to the accessory function attribute atomic information and the accessory function attribute atomic library to obtain the specified standard accessory function attributes.
In a third aspect of the invention, there is provided a computer readable storage medium having stored thereon computer instructions which, when executed by a processor, carry out the steps of the method according to any one of the first aspect of the invention.
The technical scheme of the embodiment of the invention has the following advantages:
the embodiment of the invention provides a vehicle data cleaning method, a vehicle data cleaning device and a storage medium. The existing vehicle data cleaning method is generally manual searching, and is low in efficiency and high in operation difficulty. According to the invention, the standard vehicle data are screened according to the original vehicle data, so that the standardization of the original vehicle data can be realized, and the intelligent level of data cleaning is improved. The invention improves the speed and the accuracy of subsequent screening by carrying out word segmentation processing on the original vehicle data.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
FIG. 1 is a table of factory accessory inventory data in accordance with the prior art.
Fig. 2 is a conventional fitting matching table.
FIG. 3 is a flowchart of a vehicle data cleansing method according to an embodiment of the present invention.
Fig. 4 is a block diagram of a vehicle data cleansing apparatus according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. It is to be understood that the embodiments described are only a few embodiments of the present application and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
In the description of the present application, it is to be understood that the terms "center," "longitudinal," "lateral," "length," "width," "thickness," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," "clockwise," "counterclockwise," and the like are used in the orientations and positional relationships indicated in the drawings for convenience in describing the present application and for simplicity in description, and are not intended to indicate or imply that the referenced devices or elements must have a particular orientation, be constructed in a particular orientation, and be operated in a particular manner, and are not to be construed as limiting the present application. Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, features defined as "first", "second", may explicitly or implicitly include one or more of the described features. In the description of the present application, "a plurality" means two or more unless specifically limited otherwise.
In the description of the present application, it is to be noted that, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; may be mechanically connected, may be electrically connected or may be in communication with each other; either directly or indirectly through intervening media, either internally or in any other relationship. The specific meaning of the above terms in the present application can be understood by those of ordinary skill in the art as appropriate.
In this application, unless expressly stated or limited otherwise, the first feature "on" or "under" the second feature may comprise direct contact of the first and second features, or may comprise contact of the first and second features not directly but through another feature in between. Also, the first feature being "on," "above" and "over" the second feature includes the first feature being directly on and obliquely above the second feature, or merely indicating that the first feature is at a higher level than the second feature. A first feature being "under," "below," and "beneath" a second feature includes the first feature being directly under and obliquely below the second feature, or simply meaning that the first feature is at a lesser elevation than the second feature.
The following disclosure provides many different embodiments or examples for implementing different features of the application. In order to simplify the disclosure of the present application, specific example components and arrangements are described below. Of course, they are merely examples and are not intended to limit the present application. Moreover, the present application may repeat reference numerals and/or letters in the various examples, such repetition is for the purpose of simplicity and clarity and does not in itself dictate a relationship between the various embodiments and/or configurations discussed. In addition, examples of various specific processes and materials are provided herein, but one of ordinary skill in the art may recognize applications of other processes and/or use of other materials.
FIG. 3 is a flowchart of a vehicle data cleansing method according to an embodiment of the present invention. As shown in fig. 3, the vehicle data washing method includes the steps of:
s1: acquiring standard vehicle data;
the standard vehicle data comprises standard vehicle types, a vehicle type atom library, standard accessories, an accessory atom library, standard accessory function attributes and an accessory function attribute atom library. In this embodiment, the standard vehicle model includes at least one of: vehicle type name, industry and trust department bulletin number, distribution channel sale type, vehicle body form and country. The automobile model atom library comprises automobile model common names, the accessory atom library comprises accessory common names, and the accessory function attribute atom library comprises accessory function attribute common names. Each standard vehicle type corresponds to at least one vehicle type common name, each standard accessory corresponds to at least one accessory common name, and each standard accessory functional attribute corresponds to at least one accessory functional attribute common name. In the field of automotive profession, the same part names are written names such as front bumper skins, engine covers and midnets, and the written names are standard names of parts, and are also common names in the industry such as front bumper skins, head covers and ghost face masks. The front bumper is a front bumper skin, the head cover is an engine cover, and the ghost mask is a middle net. Moreover, there are many different common names for an object, the common names for the front bumper skins being: the front bumper comprises a front bumper, a front rod, a front pump handle, a front bumper skin and the like. Standard vehicle models, accessories and standard accessory functional attributes are written designations.
S2: acquiring original vehicle data;
wherein the raw vehicle data includes at least one of: raw vehicle model data, raw accessory data, and raw accessory functional attribute data. In this embodiment, the original vehicle model data includes a colloquial name of a vehicle model, the original accessory data includes a colloquial name of an accessory, and the original accessory functional attribute data includes a colloquial name of an accessory functional attribute.
S3: and when the original vehicle data comprises original vehicle type data, performing word segmentation on the original vehicle type data to obtain vehicle type atom information, and screening standard vehicle types according to the vehicle type atom information and a vehicle type atom library to obtain specified standard vehicle types. In this embodiment, one or more specified standard vehicle types may be selected. The standard vehicle data is preferably filtered by an intelligent Search center (Ai Search).
And when the original vehicle data comprises original accessory data, performing word segmentation processing on the original accessory data to obtain accessory atom information, and screening standard accessories according to the accessory atom information and an accessory atom library to obtain specified standard accessories. In this embodiment, one or more specified standard vehicle types may be selected. The standard vehicle data is preferably filtered by an intelligent Search center (Ai Search).
And when the original vehicle data comprises original accessory function attribute data, performing word segmentation processing on the original accessory function attribute data to obtain accessory function attribute atomic information, and screening standard accessory function attributes according to the accessory function attribute atomic information and an accessory function attribute atomic library to obtain specified standard accessory function attributes. In this embodiment, one or more specified standard vehicle types may be selected. The standard vehicle data is preferably filtered by an intelligent Search center (Ai Search).
In this embodiment, the word segmentation process includes splitting a field into words. For example, the word cutting processing is carried out on the 'wining sports edition' to obtain 'wining' and 'sports', and the word cutting processing is carried out on the 'Passat Bourg' to obtain 'Passat' and 'Bourg'. The standard keywords preferably include brand, manufacturer, chassis, vehicle series, vehicle type, emission, year, engine, transmission, and sales layout.
The existing automobile accessory data cleaning method is generally manual searching, and is low in efficiency and high in operation difficulty. According to the invention, the standardization of the original vehicle data can be realized by matching and screening the original vehicle data, and the intelligent level of data cleaning is improved. The invention improves the speed and the accuracy of subsequent screening by carrying out word segmentation processing on the original vehicle data. The invention converts the original vehicle data with different semantics and dimensions into the vehicle type data with the finest dimension through word segmentation processing, is beneficial to recognition and logic processing, and greatly improves the efficiency of standardized processing. In use, for example, "tittle 2019 comfort version 1.4" may be converted by a vehicle data washing method of an embodiment of the present invention to: the brand-brand is ' popular-popular, the vehicle group is ' soar 0J 2019 ', the vehicle type is ' soar ', the displacement-engine number is ' 1.4T-DJSA ', the sales edition type is ' 1.4TSI double clutch 280TSI comfortable type ', the annual style is ' 2019 ' and the standard vehicle type information is ' MJS9208637 '. The method is different from the existing method that the search repeatedly provides single-vehicle type query when the user directly queries, and the multi-vehicle type query is performed in a catalogue matching scene.
In a specific embodiment, each standard vehicle type has a corresponding relationship with at least one standard accessory, and each standard accessory has a corresponding relationship with at least one standard accessory functional attribute. The standard vehicle data further comprises original plant accessory codes, and each standard accessory is in corresponding relation with one original plant accessory code. When the original vehicle data comprises original vehicle type data, the method further comprises the steps of obtaining standard accessories corresponding to the specified standard vehicle type, obtaining standard accessory functional attributes corresponding to the specified standard vehicle type, and obtaining original plant accessory codes corresponding to the specified standard vehicle type. When the original vehicle data comprises original accessory data, the method further comprises the steps of obtaining a standard vehicle type corresponding to the specified standard accessory, obtaining a standard accessory functional attribute corresponding to the specified standard accessory, and obtaining an original factory accessory code corresponding to the specified standard accessory. When the original vehicle data comprises original accessory function attribute data, the method further comprises the steps of obtaining a standard vehicle type corresponding to the specified accessory function attribute, obtaining a standard accessory corresponding to the specified accessory function attribute, and obtaining an original manufacturer accessory code corresponding to the specified accessory function attribute.
Compared with the prior art, the vehicle data cleaning method provided by the embodiment of the invention can establish a relation between original vehicle data, standard vehicle data and original factory codes. According to the embodiment of the invention, the original vehicle type data, the original accessory data and the original accessory function attribute data are respectively processed, and the mutual verification can be realized when a plurality of processing results are obtained, so that the data stability is improved.
In a specific embodiment, when the original vehicle data includes original vehicle type data, the method further includes obtaining data of a specified original vehicle type according to an original fitting code corresponding to the specified standard vehicle type, and verifying the specified standard vehicle type according to the data of the specified original vehicle type. When the original vehicle data comprises original accessory data, acquiring appointed original accessory data according to original accessory codes corresponding to the appointed standard accessories, and verifying the appointed standard accessories according to the appointed original accessory data. And when the original vehicle data comprises original accessory function attribute data, acquiring appointed original accessory function attribute data according to original accessory codes corresponding to the appointed standard accessory function attribute, and verifying the appointed standard accessory function attribute according to the appointed original accessory function attribute data.
Compared with the prior art, the embodiment of the invention verifies the function attributes of the specified standard vehicle type, the specified standard accessory or the specified standard accessory according to the original factory data, thereby improving the reliability of the data.
In a specific embodiment, when the designated standard component is a plurality, the method further comprises sorting the plurality of designated standard components. The step of ordering the plurality of specified criteria accessories comprises:
the value of credit for each of the specified standard accessories was set to 0. Acquiring the accessory function attribute corresponding to any one of the specified standard accessories and recording the accessory function attribute as a grading accessory function attribute.
Performing the following steps for each scored accessory functional attribute:
respectively calculating the attribute score of each specified standard accessory relative to the current scored accessory functional attribute, and adding 1 to the score value of the specified standard accessory with the highest attribute score relative to the current scored accessory functional attribute;
the assigned standard accessories are sorted from high to low according to the score value.
In this embodiment, when the functional attribute of the scoring accessory is biased, the calculation formula of the attribute scoring is as follows:
Figure BDA0002765418630000111
wherein na is a standard accessory, op is a score accessory function attribute, prn is an nth configuration code, and S(na, op, have)For non-deduplication source plant vehicle data totals including both standard and scored accessory functional attributes, S(na, op, prn, have)The total number of original factory vehicle data which simultaneously comprises standard accessories, grading accessory functional attributes and the nth configuration code;
when the scoring accessory functional attribute is non-tendency, the calculation formula of the accessory score is as follows:
Figure BDA0002765418630000121
wherein na is a standard accessory, op is a score accessory function attribute, and prn is an nth configuration code,S(na, op, none)Total number of non-deduplication source vehicle data, S, excluding standard and scored accessory functional attributes(na, op, prn, none)The total number of genuine vehicle data excluding the standard accessory, the scored accessory function attribute and the nth configuration code.
Compared with the prior art, the vehicle data cleaning method provided by the embodiment of the invention scores the standard accessories according to the similarity between the standard accessories and the original factory accessories, and can determine the configuration height of the standard accessories.
In a specific embodiment, when the score values of at least two specified standard accessories are the same, the core score accessory functional attributes are obtained from the score accessory functional attributes, and the higher the score of the specified standard accessories relative to the attribute of the core score accessory functional attributes is, the higher the rank is, the higher the core score accessory functional attributes is, the core score is, the higher the core score accessory functional attributes are, the higher the rank is, the core score is, the higher the core score is, the core score accessories are obtained, the core score is obtained, and the core score is obtained, and the core is obtained, the core score is obtained, and the core score is obtained, and the core score is obtained, and the core is obtained, the core score is obtained by the core is obtained, the core is obtained by the core score is obtained, the core score is obtained by the core score is obtained, the core is obtained by the core score is obtained by the core.
In this embodiment, the weight of the core function attribute may be increased according to actual requirements.
In a specific embodiment, the standard vehicle data further includes vehicle type configuration scores, one for each standard vehicle type. The vehicle type configuration score calculating method comprises the following steps: the method comprises the steps of obtaining original plant accessory function attributes corresponding to standard vehicle models, screening the original plant accessory function attributes according to the standard accessory function attributes corresponding to the standard vehicle models to obtain matched original plant accessory function attributes, and calculating the ratio of the total number of the matched original plant accessory function attributes to the total number of the original plant accessory function attributes. And when the original vehicle data comprises the original vehicle type data, acquiring a vehicle type configuration score corresponding to the specified standard vehicle type.
Compared with the prior art, the vehicle data cleaning method provided by the embodiment of the invention scores the standard vehicle type according to the similarity between the standard vehicle type part and the original factory vehicle type, and can determine the configuration height of the standard vehicle type.
In this embodiment, a vehicle data washing apparatus is further provided, and the apparatus is used to implement the above embodiments and preferred embodiments, and the description of the apparatus is omitted. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.
Fig. 4 is a block diagram showing a configuration of a vehicle data cleansing apparatus according to an embodiment of the present invention, and as shown in fig. 4, the apparatus includes: first acquiring means 11 for acquiring standard vehicle data. The standard vehicle data comprises standard vehicle types, a vehicle type atom library, standard accessories, an accessory atom library, standard accessory function attributes and an accessory function attribute atom library. Second acquiring means 12 for acquiring raw vehicle data. Wherein the raw vehicle data includes at least one of: raw vehicle model data, raw accessory data, and raw accessory functional attribute data. And when the original vehicle data comprises original vehicle type data, the first screening device 13 is used for performing word segmentation processing on the original vehicle type data to obtain vehicle type atom information, and screening the standard vehicle type according to the vehicle type atom information and a vehicle type atom library to obtain an appointed standard vehicle type. And when the original vehicle data comprises original accessory data, the second screening device 14 is used for performing word segmentation processing on the original accessory data to obtain accessory atom information, and screening the standard accessories according to the accessory atom information and an accessory atom library to obtain specified standard accessories. And when the original vehicle data comprises original accessory function attribute data, the third screening device 15 is used for performing word segmentation processing on the original accessory function attribute data to obtain accessory function attribute atomic information, and screening the standard accessory function attributes according to the accessory function attribute atomic information and the accessory function attribute atomic library to obtain the specified standard accessory function attributes.
Embodiments of the present invention further provide a non-transitory computer storage medium, where computer executable instructions are stored in the computer storage medium, and the computer executable instructions may execute the vehicle data cleaning method in any of the above method embodiments. The storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a Flash Memory (Flash Memory), a Hard Disk (Hard Disk Drive, abbreviated as HDD) or a Solid State Drive (SSD), etc.; the storage medium may also comprise a combination of memories of the kind described above.
Although the embodiments of the present invention have been described in conjunction with the accompanying drawings, those skilled in the art may make various modifications and variations without departing from the spirit and scope of the invention, and such modifications and variations fall within the scope defined by the appended claims.

Claims (8)

1. A vehicle data cleansing method, characterized in that the data cleansing method comprises:
acquiring standard vehicle data;
the standard vehicle data comprises standard vehicle types, a vehicle type atom library, standard accessories, an accessory atom library, standard accessory functional attributes and an accessory functional attribute atom library;
acquiring original vehicle data;
wherein the raw vehicle data comprises at least one of: original vehicle type data, original accessory data and original accessory functional attribute data;
when the original vehicle data comprises original vehicle type data, performing word segmentation processing on the original vehicle type data to obtain vehicle type atom information, and screening the standard vehicle type according to the vehicle type atom information and the vehicle type atom library to obtain an appointed standard vehicle type;
when the original vehicle data comprises original accessory data, performing word segmentation processing on the original accessory data to obtain accessory atom information, and screening the standard accessories according to the accessory atom information and the accessory atom library to obtain specified standard accessories; when the appointed standard accessories are multiple, sequencing the appointed standard accessories;
the step of ordering a plurality of said specified standard accessories comprises:
setting the value of credit of each of the specified standard accessories to 0;
acquiring an accessory function attribute which has a corresponding relation with any one of the specified standard accessories and recording the accessory function attribute as a grading accessory function attribute;
performing the following steps for each of the scored accessory functional attributes:
respectively calculating the attribute score of each specified standard accessory relative to the current functional attribute of the scored accessory, and adding 1 to the score value of the specified standard accessory with the highest attribute score relative to the current functional attribute of the scored accessory;
sorting the specified standard accessories according to the grading values from high to low;
when the functional attribute of the scoring accessory is inclined, the calculation formula of the attribute scoring is as follows:
Figure FDA0003070829670000021
wherein na is a standard accessory, op is a score accessory function attribute, prn is an nth configuration code, and S(na, op, have)For non-deduplication source plant vehicle data totals including both standard and scored accessory functional attributes, S(na, op, prn, have)The total number of original factory vehicle data which simultaneously comprises standard accessories, grading accessory functional attributes and the nth configuration code;
when the scoring accessory functional attribute is non-tendency, the calculation formula of the accessory score is as follows:
Figure FDA0003070829670000022
wherein na is a standard accessory, op is a score accessory function attribute, prn is an nth configuration code, and S(na, op, none)Total number of non-deduplication source vehicle data, S, excluding standard and scored accessory functional attributes(na, op, prn, none)The total number of original factory vehicle data which does not comprise standard accessories, grading accessory function attributes and nth configuration codes;
and when the original vehicle data comprises original accessory function attribute data, performing word segmentation processing on the original accessory function attribute data to obtain accessory function attribute atomic information, and screening the standard accessory function attributes according to the accessory function attribute atomic information and the accessory function attribute atomic library to obtain the appointed standard accessory function attributes.
2. The vehicle data cleaning method according to claim 1, wherein each standard vehicle type is in correspondence with at least one standard accessory, and each standard accessory is in correspondence with at least one standard accessory functional attribute;
the standard vehicle data further comprises original factory accessory codes, and each standard accessory has a corresponding relation with one original factory accessory code;
when the original vehicle data comprises original vehicle type data, the method further comprises the steps of obtaining the standard accessories corresponding to the specified standard vehicle types, obtaining the standard accessory functional attributes corresponding to the specified standard vehicle types, and obtaining the original plant accessory codes corresponding to the specified standard vehicle types;
when the original vehicle data comprises original accessory data, the method further comprises the steps of obtaining the standard vehicle model corresponding to the specified standard accessory, obtaining the standard accessory functional attribute corresponding to the specified standard accessory, and obtaining the original factory accessory code corresponding to the specified standard accessory;
when the original vehicle data comprises original accessory function attribute data, the method further comprises the steps of obtaining the standard vehicle type corresponding to the specified accessory function attribute, obtaining the standard accessory corresponding to the specified accessory function attribute, and obtaining the original plant accessory code corresponding to the specified accessory function attribute.
3. The vehicle data cleaning method according to claim 2, wherein when the original vehicle data includes original vehicle type data, the method further includes acquiring specified original plant vehicle type data according to the original plant accessory code corresponding to the specified standard vehicle type, and verifying the specified standard vehicle type according to the specified original plant vehicle type data;
when the original vehicle data comprises original accessory data, acquiring appointed original accessory data according to the original accessory code corresponding to the appointed standard accessory, and verifying the appointed standard accessory according to the appointed original accessory data;
and when the original vehicle data comprises original accessory function attribute data, acquiring appointed original accessory function attribute data according to the original accessory code corresponding to the appointed standard accessory function attribute, and verifying the appointed standard accessory function attribute according to the appointed original accessory function attribute data.
4. The vehicle data washing method according to claim 1, wherein when there are at least two of the specified standard accessories whose score values are the same, a core score accessory functional attribute is obtained from the score accessory functional attributes, and the specified standard accessories are ranked higher relative to the higher attribute score of the core score accessory functional attribute.
5. The vehicle data cleansing method according to claim 1, wherein the standard vehicle data further includes vehicle model configuration scores, one for each of the standard vehicle models;
the vehicle type configuration score calculating method comprises the following steps: acquiring original plant accessory function attributes corresponding to the standard vehicle type, screening the original plant accessory function attributes according to the standard accessory function attributes corresponding to the standard vehicle type to obtain matched original plant accessory function attributes, and calculating the ratio of the total number of the matched original plant accessory function attributes to the total number of the original plant accessory function attributes;
and when the original vehicle data comprises original vehicle type data, acquiring the vehicle type configuration score corresponding to the specified standard vehicle type.
6. The vehicle data cleansing method according to any one of claims 1 to 5, wherein the standard vehicle type includes at least one of: vehicle type name, industry and trust department bulletin number, distribution channel sale type, vehicle body form and country.
7. A vehicle data cleansing apparatus for use in the vehicle data cleansing method according to any one of claims 1 to 6, characterized by comprising:
first acquiring means for acquiring standard vehicle data;
the standard vehicle data comprises standard vehicle types, a vehicle type atom library, standard accessories, an accessory atom library, standard accessory functional attributes and an accessory functional attribute atom library;
second acquiring means for acquiring original vehicle data;
wherein the raw vehicle data comprises at least one of: original vehicle type data, original accessory data and original accessory functional attribute data;
the system comprises a first screening device, a second screening device and a third screening device, wherein when the original vehicle data comprise original vehicle type data, the first screening device is used for carrying out word segmentation processing on the original vehicle type data to obtain vehicle type atom information, and screening the standard vehicle type according to the vehicle type atom information and a vehicle type atom library to obtain an appointed standard vehicle type;
the second screening device is used for carrying out word segmentation processing on the original accessory data to obtain accessory atom information when the original vehicle data comprise original accessory data, and screening the standard accessories according to the accessory atom information and the accessory atom library to obtain specified standard accessories;
and when the original vehicle data comprises original accessory function attribute data, the third screening device is used for performing word segmentation processing on the original accessory function attribute data to obtain accessory function attribute atomic information, and screening the standard accessory function attributes according to the accessory function attribute atomic information and the accessory function attribute atomic library to obtain the specified standard accessory function attributes.
8. A computer-readable storage medium having stored thereon computer instructions, which when executed by a processor, perform the steps of the method of any of claims 1-6.
CN202011231659.XA 2020-11-06 2020-11-06 Vehicle data cleaning method and device and storage medium Active CN112417861B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011231659.XA CN112417861B (en) 2020-11-06 2020-11-06 Vehicle data cleaning method and device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011231659.XA CN112417861B (en) 2020-11-06 2020-11-06 Vehicle data cleaning method and device and storage medium

Publications (2)

Publication Number Publication Date
CN112417861A CN112417861A (en) 2021-02-26
CN112417861B true CN112417861B (en) 2021-07-23

Family

ID=74781991

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011231659.XA Active CN112417861B (en) 2020-11-06 2020-11-06 Vehicle data cleaning method and device and storage medium

Country Status (1)

Country Link
CN (1) CN112417861B (en)

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9406077B1 (en) * 2011-10-19 2016-08-02 Google Inc. System and method for ad keyword scoring
CN107085773A (en) * 2017-05-16 2017-08-22 交通运输部公路科学研究所 A kind of system and method for being used to evaluate vehicle in use technology status
CN107451778A (en) * 2017-07-31 2017-12-08 无锡市书同文数据有限公司 Auto parts and components Data Matching processing method
CN108520270B (en) * 2018-03-12 2021-06-04 明觉科技(北京)有限公司 Part matching method, system and terminal
US11232365B2 (en) * 2018-06-14 2022-01-25 Accenture Global Solutions Limited Digital assistant platform
CN111368366A (en) * 2018-12-06 2020-07-03 比亚迪股份有限公司 Method and device for analyzing health state of vehicle part and storage medium
US10679012B1 (en) * 2019-04-18 2020-06-09 Capital One Services, Llc Techniques to add smart device information to machine learning for increased context
CN111160595A (en) * 2019-06-24 2020-05-15 上海明据信息科技有限公司 Intelligent system for adjusting and optimizing checking path of parts in factory
CN110555024B (en) * 2019-08-13 2022-03-29 广东数鼎科技有限公司 Accurate automobile model matching system based on artificial intelligence algorithm
CN111563104A (en) * 2020-04-30 2020-08-21 深圳壹账通智能科技有限公司 Method, device, equipment and storage medium for identifying nickname of vehicle accessory

Also Published As

Publication number Publication date
CN112417861A (en) 2021-02-26

Similar Documents

Publication Publication Date Title
US7155427B1 (en) Configurable search tool for finding and scoring non-exact matches in a relational database
US8112421B2 (en) Query selection for effectively learning ranking functions
US8996502B2 (en) Using join dependencies for refresh
US8086598B1 (en) Query optimizer with schema conversion
US7376638B2 (en) System and method for addressing inefficient query processing
JP4997856B2 (en) Database analysis program, database analysis apparatus, and database analysis method
US20160019611A1 (en) Method and system for determining allied products
CN108153894B (en) Automatic modeling method and classifier device for OLAP data model
CN110659282B (en) Data route construction method, device, computer equipment and storage medium
US20130254171A1 (en) Query-based searching using a virtual table
US20080059486A1 (en) Intelligent data search engine
RU2005114658A (en) METHOD AND SYSTEM FOR AGREEMENT OF WEB DATABAS SCHEMES
US9477729B2 (en) Domain based keyword search
US7478083B2 (en) Method and system for estimating cardinality in a database system
US20050065939A1 (en) Method and system for optimizing snow flake queries
US8037062B2 (en) System and method for automatically selecting a data source for providing data related to a query
CN108520270B (en) Part matching method, system and terminal
CN109791543B (en) Control method for executing multi-table connection operation and corresponding device
US20220050843A1 (en) Learning-based query plan cache for capturing low-cost query plan
CN114186026A (en) Natural language processing method, device, equipment and storage medium
CA3126306A1 (en) Query processing using logical query steps having canonical forms
CN112417861B (en) Vehicle data cleaning method and device and storage medium
CN110874366B (en) Data processing and inquiring method and device
CN113792084A (en) Data heat analysis method, device, equipment and storage medium
CN111191430B (en) Automatic table building method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant