CN111158666A - Entity normalization processing method, device, equipment and storage medium - Google Patents

Entity normalization processing method, device, equipment and storage medium Download PDF

Info

Publication number
CN111158666A
CN111158666A CN201911379440.1A CN201911379440A CN111158666A CN 111158666 A CN111158666 A CN 111158666A CN 201911379440 A CN201911379440 A CN 201911379440A CN 111158666 A CN111158666 A CN 111158666A
Authority
CN
China
Prior art keywords
comparison
entity
rule
target attribute
entity normalization
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911379440.1A
Other languages
Chinese (zh)
Other versions
CN111158666B (en
Inventor
王冠朝
方舟
江涛
仲夏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201911379440.1A priority Critical patent/CN111158666B/en
Publication of CN111158666A publication Critical patent/CN111158666A/en
Application granted granted Critical
Publication of CN111158666B publication Critical patent/CN111158666B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/30Creation or generation of source code

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Stored Programmes (AREA)

Abstract

The application discloses an entity normalization processing method, an entity normalization processing device and a storage medium, and relates to an entity normalization processing technology. The specific implementation scheme is as follows: receiving a rule parameter related to an entity normalization strategy input by a user; generating a program code corresponding to the entity normalization strategy according to the rule parameters and a preset code generation rule; and running a program code corresponding to the entity normalization strategy, and carrying out normalization judgment on entities in a preset entity data set so as to cluster the same entities. The user only needs to input the rule parameters related to the entity normalization strategy, the program codes corresponding to the entity normalization strategy can be automatically generated according to the rule parameters and the preset code generation rules, user programming is not needed, the manpower development cost and the learning cost are reduced, the threshold of data production is reduced, the entity normalization strategy is convenient to modify, the efficiency of entity normalization processing is improved, and the method can be applied to entity normalization processing of data in any field.

Description

Entity normalization processing method, device, equipment and storage medium
Technical Field
The application relates to the technical field of data processing, in particular to an entity normalization processing technology.
Background
In constructing the data of the knowledge graph, since the construction of the knowledge graph often requires the use of a plurality of different data sources, it is an important task to perform normalization fusion on the same entities in different data sources. For example, the data of the movie "the son of the weather" comes from three different websites, the relevant attributes are 2019-11-01 (China), 2019-07-19 (Japan) and 2019-11-01 (China), the directors are all new seas and the like, so that the same entity is referred to, and the disambiguation of the entity needs to be carried out. The entity disambiguation process is divided into two steps of entity normalization and fusion, wherein the entity normalization is to normalize the same entity into the same set; and when in fusion, the entities in the same set are fused, and the strategy is used for attribute preference, so that the entities are finally fused into one entity.
The existing entity normalization method generally needs a research and development engineer to program according to data investigated in the early stage, and realizes entity normalization by running program codes; or training an entity normalization model through training data in a model training mode, and realizing entity normalization through the entity normalization model. In the existing entity normalization method, a research and development manner of self programming of engineers is needed, a large amount of labor cost is consumed, the learning difficulty is high, and the standardization guarantee is lacked; however, the method of using the model to perform entity normalization requires a large amount of labeled data in the model training process, requires a professional algorithm engineer to perform iteration, is difficult to apply to business scenes, has poor industrial universality, and lacks applicability.
Disclosure of Invention
The application provides an entity normalization processing method, an entity normalization processing device and a storage medium, so that corresponding program codes are automatically generated according to rule parameters related to an entity normalization strategy input by a user, and the manpower development cost and the learning cost are reduced.
A first aspect of the present application provides a method comprising:
receiving a rule parameter related to an entity normalization strategy input by a user;
generating a program code corresponding to the entity normalization strategy according to the rule parameters and a preset code generation rule;
and running a program code corresponding to the entity normalization strategy, and carrying out normalization judgment on entities in a preset entity data set so as to cluster the same entities.
In the embodiment, a user only needs to input the rule parameters related to the entity normalization strategy, can automatically generate the program codes corresponding to the entity normalization strategy, does not need user programming, reduces the manpower development cost and the learning cost, lowers the threshold of data production, is convenient to modify the entity normalization strategy, improves the efficiency of entity normalization processing, and can be applied to entity normalization processing of data in any field.
In one possible design, the rule parameters include at least one target attribute to be compared, comparison condition parameters corresponding to the target attributes, and comparison rules combined among comparison conditions corresponding to the target attributes.
In a possible design, the generating a program code corresponding to an entity normalization policy according to the rule parameter and a preset code generation rule includes:
aiming at any target attribute to be compared, acquiring a comparison function of the target attribute according to the type of the target attribute and a comparison condition parameter corresponding to the target attribute;
calling a corresponding comparison function and determining a logic operation type according to each comparison rule to obtain a program code of the comparison rule;
and obtaining a program code corresponding to the entity normalization strategy according to the program code of each comparison rule.
In one possible design, the comparison condition parameters corresponding to the target attribute include a type of the target attribute, a comparison condition corresponding to the target attribute, and a degree of strictness of a comparison process.
In a possible design, the obtaining a comparison function of the target attribute according to the type of the target attribute and the comparison condition parameter corresponding to the target attribute includes:
determining a comparison method parameter in the comparison function according to the type of the target attribute;
determining a multi-valued comparison condition parameter in the comparison function according to the strictness degree of the comparison process, wherein the multi-valued comparison condition parameter comprises: the multiple values are identical, at least one identical, or completely different;
determining supplementary parameters in the comparison function according to the comparison condition and/or a preset data cleaning instruction;
and obtaining a comparison function of the target attribute according to the target attribute, the comparison method parameter, the multi-valued comparison condition parameter and the supplementary parameter.
In a possible design, the obtaining, according to the program code of each comparison rule, the program code corresponding to the entity normalization policy includes:
and receiving a priority order of the comparison rules set by a user, setting priorities for the program codes of the comparison rules according to the priority order of the comparison rules, and operating the program codes of the comparison rules according to the priorities when operating the program codes corresponding to the entity normalization strategy.
In one possible design, the running the program code corresponding to the entity normalization policy further includes:
receiving a starting instruction of a user, and operating a program code corresponding to the entity normalization strategy according to the operating instruction; and/or
Receiving a stopping instruction of a user, and stopping running a program code corresponding to the entity normalization strategy according to the stopping instruction;
after clustering the same entity, the method further comprises:
and receiving a viewing result instruction of a user, and displaying the clustering result according to the viewing result instruction.
A second aspect of the present application provides an entity normalization processing apparatus, including:
the input module is used for receiving the rule parameters related to the entity normalization strategy input by a user;
the processing module is used for generating a program code corresponding to the entity normalization strategy according to the rule parameters and a preset code generation rule;
and the operation module is used for operating the program code corresponding to the entity normalization strategy and carrying out normalization judgment on the entities in the preset entity data set so as to cluster the same entities.
In one possible design, the rule parameters include at least one target attribute to be compared, comparison condition parameters corresponding to the target attributes, and comparison rules combined among comparison conditions corresponding to the target attributes.
In one possible design, the processing module is to:
aiming at any target attribute to be compared, acquiring a comparison function of the target attribute according to the type of the target attribute and a comparison condition parameter corresponding to the target attribute;
calling a corresponding comparison function and determining a logic operation type according to each comparison rule to obtain a program code of the comparison rule;
and obtaining a program code corresponding to the entity normalization strategy according to the program code of each comparison rule.
In one possible design, the comparison condition parameters corresponding to the target attribute include a type of the target attribute, a comparison condition corresponding to the target attribute, and a degree of strictness of a comparison process.
In one possible design, the processing module is to:
determining a comparison device parameter in the comparison function according to the type of the target attribute;
determining a multi-valued comparison condition parameter in the comparison function according to the strictness degree of the comparison process, wherein the multi-valued comparison condition parameter comprises: the multiple values are identical, at least one identical, or completely different;
determining supplementary parameters in the comparison function according to the comparison condition and/or a preset data cleaning instruction;
and obtaining a comparison function of the target attribute according to the target attribute, the comparison device parameter, the multi-valued comparison condition parameter and the supplementary parameter.
In one possible design, the processing module is to:
and receiving a priority order of the comparison rules set by a user, setting priorities for the program codes of the comparison rules according to the priority order of the comparison rules, and operating the program codes of the comparison rules according to the priorities when operating the program codes corresponding to the entity normalization strategy.
In a possible design, the input module is further configured to receive a start instruction of a user;
the operation module is further used for operating the program code corresponding to the entity normalization strategy according to the operation instruction; and/or
The input module is also used for receiving a stop instruction of a user;
the running module is further used for stopping running the program code corresponding to the entity normalization strategy according to the stopping instruction;
the input module is also used for receiving a result viewing instruction of a user;
the operation module is further used for displaying the clustering result according to the checking result instruction.
A third aspect of the present application provides an electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of the first aspect.
A fourth aspect of the present application provides a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of the first aspect.
A fifth aspect of the application provides a computer program comprising program code for performing the method according to the first aspect when the computer program is run by a computer.
A sixth aspect of the present application provides an entity normalization processing method, including:
receiving a rule parameter related to an entity normalization strategy input by a user;
acquiring an entity normalization strategy according to the rule parameters;
and carrying out normalization judgment on the entities in the preset entity data set according to the entity normalization strategy, and outputting a normalization judgment result.
One embodiment in the above application has the following advantages or benefits: the user only needs to input the rule parameters related to the entity normalization strategy, the program codes corresponding to the entity normalization strategy can be automatically generated according to the rule parameters and the preset code generation rules, user programming is not needed, the manpower development cost and the learning cost are reduced, the threshold of data production is reduced, the entity normalization strategy is convenient to modify, the efficiency of entity normalization processing is improved, and the method can be applied to entity normalization processing of data in any field. The user interaction interface is used for visual operation, so that the data production cost and the threshold are greatly reduced, the entity normalization strategy is convenient to make and modify, and convenience is brought to the user for flexibly processing the entity data.
Other effects of the above-described alternative will be described below with reference to specific embodiments.
Drawings
The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:
fig. 1 is a flowchart of an entity normalization processing method provided in an embodiment of the present application;
FIG. 2 is a flowchart of an entity normalization processing method according to another embodiment of the present application;
fig. 3 is a block diagram of an entity normalization processing apparatus according to another embodiment of the present application;
fig. 4 is a block diagram of an electronic device for implementing the entity normalization processing method according to the embodiment of the present application.
Detailed Description
The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
In the existing entity normalization method, a research and development manner of self programming of engineers is needed, a large amount of labor cost is consumed, the learning difficulty is high, and the standardization guarantee is lacked; however, the method of using the model to perform entity normalization requires a large amount of labeled data in the model training process, requires a professional algorithm engineer to perform iteration, is difficult to apply to business scenes, has poor industrial universality, and lacks applicability. Aiming at the technical problems of the existing entity normalization method, the user only needs to input the rule parameters related to the entity normalization strategy, the program codes corresponding to the entity normalization strategy can be automatically generated according to the rule parameters and the preset code generation rule, user programming is not needed, the manpower development cost and the learning cost are reduced, the threshold of data production is reduced, the entity normalization strategy is convenient to modify, the efficiency of entity normalization processing is improved, and the method can be applied to entity normalization processing of data in any field.
The following describes the entity normalization process in detail with reference to specific embodiments.
An embodiment of the present application provides an entity normalization processing method, and fig. 1 is a flowchart of the entity normalization processing method provided in the embodiment of the present invention. The execution subject may be an electronic device, as shown in fig. 1, and the entity normalization processing method includes the following specific steps:
s101, receiving a rule parameter related to an entity normalization strategy input by a user.
In this embodiment, the user may input a rule parameter related to the entity normalization policy, where the entity normalization policy may include at least one comparison rule, and each comparison rule may include at least one rule parameter; the rule parameters may be semantically-facilitated rule parameters which are input by a user through a natural language, in this embodiment, a first user interaction interface for inputting the rule parameters may be provided, where the rule parameters include at least one target attribute to be compared, comparison condition parameters corresponding to the target attribute, and comparison rules combined among comparison conditions corresponding to the target attributes, and after the rule parameters are input by the user through the first user interaction interface, a comparison rule may be obtained. For example, the dubbing actors are used as target attributes to be compared, the comparison condition parameters corresponding to the target attributes may include, but are not limited to, the type (text, number, time, etc.) of the target attributes, the comparison conditions (identical, inclusion relationship, edit distance, semantic similarity, etc.) corresponding to the target attributes, and the strictness degree (loose, strict, etc.) of the comparison process, and the comparison rules combined among the comparison conditions corresponding to the target attributes (for example, "when condition 1, condition 2, and condition 3 are simultaneously satisfied, each entity is regarded as the same entity," and the logical operation between the three comparison conditions is "and"). In addition, the first user interaction interface also provides functions of adding rule parameters and deleting rule parameters, and after receiving an instruction of a user for adding the rule parameters or deleting the rule parameters, corresponding adding or deleting actions can be executed.
In addition, the embodiment also provides a second user interaction interface including (but not limited to) functions of starting, closing, editing, deleting, prioritizing and the like for any comparison rule; wherein the user may jump to the first user interaction interface by clicking an edit button in the second user interaction interface. And a visualized entity normalization strategy which is convenient to understand can be obtained through the second user interaction interface.
And S102, generating a program code corresponding to the entity normalization strategy according to the rule parameters and a preset code generation rule.
In this embodiment, after the rule parameters related to the entity normalization policy are obtained, the rule parameters related to the entity normalization policy may be automatically translated into program codes corresponding to the entity normalization policy according to a preset code generation rule, and further, the program codes may be stored or directly run. The preset code generation rule in this embodiment may be a rule for how to generate the comparison function according to the rule parameters, and a unified template of the comparison function may be first defined, which includes some necessary parameters of the comparison function (e.g., comparison method, comparison condition, supplementary parameters, etc.), and these parameters may be determined according to the rule parameters input by the user.
In addition, the embodiment can facilitate the user to modify the comparison rule, and the corresponding program code can be automatically updated according to the modified rule parameter only by modifying the rule parameter by the user, so that the labor cost is reduced, and the efficiency is improved.
S103, running a program code corresponding to the entity normalization strategy, and carrying out normalization judgment on entities in a preset entity data set so as to cluster the same entities.
In this embodiment, before the program code corresponding to the entity normalization policy is run, the entity data set requiring normalization processing may be determined, and a user may select a data source through a predetermined user interaction interface to determine the entity data set requiring normalization processing, and when the code is run, any two pieces of entity data in the entity data set may be compared according to the comparison rule of the normalization policy, and whether the two pieces of entity data are the same entity is determined, and then the same entity is clustered, so as to obtain the result of normalizing the entity data set.
Furthermore, a starting instruction of a user can be received, and a program code corresponding to the entity normalization strategy is operated according to the operation instruction; and/or receiving a stopping instruction of a user, and stopping running the program code corresponding to the entity normalization strategy according to the stopping instruction. That is, in this embodiment, the user can control the operation and stop of the program code at any time according to the need. In this embodiment, a third user interaction interface is provided, and a task management function is provided on the third user interaction interface, so that the user can control the operation and stop of the program code.
Further, after the same entities are clustered, a viewing result instruction of a user is received, and a clustering result is displayed according to the viewing result instruction. For example, the user may click a view result button, and the clustering result may be presented, and the clustering result includes related information of the same entity, such as a data source.
In the entity normalization processing method provided by this embodiment, rule parameters related to an entity normalization policy input by a user are received; generating a program code corresponding to the entity normalization strategy according to the rule parameters and a preset code generation rule; and running a program code corresponding to the entity normalization strategy, and carrying out normalization judgment on entities in a preset entity data set so as to cluster the same entities. In the embodiment, a user only needs to input the rule parameters related to the entity normalization strategy, can automatically generate the program codes corresponding to the entity normalization strategy, does not need user programming, reduces the manpower development cost and the learning cost, lowers the threshold of data production, is convenient to modify the entity normalization strategy, improves the efficiency of entity normalization processing, and can be applied to entity normalization processing of data in any field.
On the basis of any of the above embodiments, the rule parameters include at least one target attribute to be compared, comparison condition parameters corresponding to the target attributes, and comparison rules combined among comparison conditions corresponding to the target attributes.
Further, as shown in fig. 2, the generating, according to the rule parameter and the preset code generation rule in S102 in the foregoing embodiment, the program code corresponding to the entity normalization policy may specifically include:
s201, aiming at any target attribute to be compared, obtaining a comparison function of the target attribute according to the type of the target attribute and a comparison condition parameter corresponding to the target attribute.
In this embodiment, a unified template of the comparison function in the preset code generation rule may be predefined, wherein the unified template includes some necessary parameters of the comparison function (e.g., comparison method, comparison condition, supplementary parameters, etc.), and the parameters may be determined according to the rule parameters input by the user. Namely, the comparison function of the target attribute can be obtained according to the type of the target attribute and the comparison condition parameter corresponding to the target attribute.
In an optional embodiment, the comparison condition parameters corresponding to the target attributes include the types (e.g. text, number, time, etc.) of the target attributes, the comparison conditions corresponding to the target attributes are identical, the inclusion relationship, the edit distance, the semantic similarity, etc.) and the strictness (loose, strict, etc.) of the comparison process.
Further, the obtaining a comparison function of the target attribute according to the type of the target attribute and the comparison condition parameter corresponding to the target attribute includes:
determining a comparison method parameter in the comparison function according to the type of the target attribute;
determining a multi-valued comparison condition parameter in the comparison function according to the strictness degree of the comparison process, wherein the multi-valued comparison condition parameter comprises: the multiple values are identical, at least one identical, or completely different;
determining supplementary parameters in the comparison function according to the comparison condition and/or a preset data cleaning instruction;
and obtaining a comparison function of the target attribute according to the target attribute, the comparison method parameter, the multi-valued comparison condition parameter and the supplementary parameter.
In this embodiment, the function name of the comparison function can be used as a key when the comparison function is called, and therefore, each comparison function has a unique function name. The compare function may have four parameters cmpattr, multicmp, singlecmp, compconf.
The cmpattr parameter is used to identify the attributes of the objects to be compared, such as the time length attribute, the region attribute, the attribute of the dubbing actor, etc. of the entities to be compared.
The multicmp parameter is a multi-valued comparison condition parameter, for example, 2 entities to be compared are "the son of the weather", the target attribute "dubbing actor" of the entity 1 is [ "be able to. The multicmp parameters may specifically include "identical", "at least one identical", "completely different", and the like, and may be determined according to the strictness of the comparison process, for example, if the comparison process is strict, the multicmp parameters are determined to be "identical" or "completely different", and if the comparison process is loose, the multicmp parameters are determined to be "at least one identical".
The single parameter is used to specify a comparison method in the single-value comparison process, for example, the single parameter is "Float" to represent floating-point number comparison, and a threshold "threshold: 0.25" of floating-point comparison is defined in the supplementary parameter, that is, a difference between floating-point numbers of two entities when the floating-point number comparison is performed needs to be smaller than the threshold. In this embodiment, the method for comparing the single parameter may specifically include: and carrying out accurate comparison, editing distance comparison, string relation comparison, time comparison, floating point number comparison, telephone number comparison, semantic similarity comparison and the like on the single values.
The compconf parameter is used to configure some supplementary parameters of the comparison function, for example, the threshold "threshold: 0.25" mentioned above, and then the target attribute value is cleaned as "clear: True" to remove redundant characters, although other supplementary parameters may also be included, which is not described herein again.
And the final return value of the comparison function is a number of [ 0-1 ] and is used for taking effect in the comparison rule.
S202, calling a corresponding comparison function according to each comparison rule, and determining a logic operation type to obtain a program code of the comparison rule.
In this embodiment, the comparison RULE (PRIO _ RULE) is specifically used to combine the comparison functions, perform a logic operation according to the result of the comparison functions, and finally determine whether the comparison functions are the same entity or not.
For each comparison rule, two elements may be included, the first element being a comparison function to be invoked and a logical operation between comparison function results, such as a comparison function with time duration as a target attribute being invoked by a function name of the comparison function and a comparison function with dubbing actors as a target attribute, the logical operation between the comparison function results being an and, that is, two entities are determined to be the same entity only if the time duration is the same and the dubbing actors are the same, and such as a comparison function with area as a target attribute being invoked by a function name of the comparison function, the comparison function result of 0 is determined to be not the same entity, that is, two entity areas are determined not to be the same entity at the same time; the second element is an element that identifies whether or not the elements are identical entities, 1 indicates identical entities, and 0 indicates not identical entities. The final output result value of the comparison rule may be a boolean value TRUE or FALSE.
And S203, obtaining a program code corresponding to the entity normalization strategy according to the program codes of the comparison rules.
In this embodiment, after obtaining the program codes of the comparison rules, the program codes corresponding to the entity normalization policy may be finally combined.
Furthermore, the priority order of the comparison rules set by the user can be received, and the priority is set for the program codes of the comparison rules according to the priority order of the comparison rules, so that the program codes of the comparison rules are operated according to the priority when the program codes corresponding to the entity normalization strategy are operated. In this embodiment, when the program codes of the comparison rules are run according to the priorities, if the comparison rule with the higher priority already determines that the two entities to be compared are the same entity or different entities, the program codes of the comparison rules with the lower priority are not run any more.
On the basis of any of the above embodiments, the rule parameters include at least one target attribute to be compared, comparison condition parameters corresponding to the target attributes, and comparison rules combined among comparison conditions corresponding to the target attributes.
In the entity normalization processing method provided in each embodiment, the user only needs to input the rule parameters related to the entity normalization policy, and can automatically generate the program code corresponding to the entity normalization policy, so that user programming is not needed, the manpower development cost and the learning cost are reduced, the threshold of data production is reduced, the entity normalization policy is convenient to modify, the efficiency of entity normalization processing is improved, and the method can be applied to entity normalization processing of data in any field. The user interaction interface is used for visual operation, so that the data production cost and the threshold are greatly reduced, the entity normalization strategy is convenient to make and modify, and convenience is brought to the user for flexibly processing the entity data.
An embodiment of the present application provides an entity normalization processing apparatus, and fig. 3 is a structural diagram of the entity normalization processing apparatus according to the embodiment of the present invention. As shown in fig. 3, the entity normalization processing apparatus 300 specifically includes: an input module 301, a processing module 302, and an execution module 303.
An input module 301, configured to receive a rule parameter related to an entity normalization policy input by a user;
a processing module 302, configured to generate a program code corresponding to the entity normalization policy according to the rule parameter and a preset code generation rule;
the operation module 303 is configured to operate a program code corresponding to the entity normalization policy, and perform normalization determination on entities in a preset entity data set, so as to cluster the same entity.
On the basis of the above embodiment, the rule parameters include at least one target attribute to be compared, comparison condition parameters corresponding to the target attributes, and comparison rules combined among comparison conditions corresponding to the target attributes.
On the basis of the foregoing embodiment, the processing module 302 is configured to:
aiming at any target attribute to be compared, acquiring a comparison function of the target attribute according to the type of the target attribute and a comparison condition parameter corresponding to the target attribute;
calling a corresponding comparison function and determining a logic operation type according to each comparison rule to obtain a program code of the comparison rule;
and obtaining a program code corresponding to the entity normalization strategy according to the program code of each comparison rule.
On the basis of the above embodiment, the comparison condition parameters corresponding to the target attributes include types of the target attributes, comparison conditions corresponding to the target attributes, and degrees of strictness of comparison processes.
On the basis of the foregoing embodiment, the processing module 302 is configured to:
determining a comparison device parameter in the comparison function according to the type of the target attribute;
determining a multi-valued comparison condition parameter in the comparison function according to the strictness degree of the comparison process, wherein the multi-valued comparison condition parameter comprises: the multiple values are identical, at least one identical, or completely different;
determining supplementary parameters in the comparison function according to the comparison condition and/or a preset data cleaning instruction;
and obtaining a comparison function of the target attribute according to the target attribute, the comparison device parameter, the multi-valued comparison condition parameter and the supplementary parameter.
On the basis of the foregoing embodiment, the processing module 302 is configured to:
and receiving a priority order of the comparison rules set by a user, setting priorities for the program codes of the comparison rules according to the priority order of the comparison rules, and operating the program codes of the comparison rules according to the priorities when operating the program codes corresponding to the entity normalization strategy.
On the basis of the above embodiment, the input module 301 is further configured to receive a start instruction of a user;
the operation module 303 is further configured to operate a program code corresponding to the entity normalization policy according to the operation instruction; and/or
The input module 301 is further configured to receive a stop instruction of a user;
the running module 303 is further configured to stop running the program code corresponding to the entity normalization policy according to the stop instruction;
the input module 301 is further configured to receive a result viewing instruction of a user;
the operation module 303 is further configured to display a clustering result according to the viewing result instruction.
The entity normalization processing apparatus provided in this embodiment may be specifically configured to execute the embodiment of the entity normalization processing method provided in the foregoing figures, and specific functions are not described herein again.
The entity normalization processing device provided by the embodiment receives the rule parameters related to the entity normalization strategy input by the user; generating a program code corresponding to the entity normalization strategy according to the rule parameters and a preset code generation rule; and running a program code corresponding to the entity normalization strategy, and carrying out normalization judgment on entities in a preset entity data set so as to cluster the same entities. In the embodiment, a user only needs to input the rule parameters related to the entity normalization strategy, can automatically generate the program codes corresponding to the entity normalization strategy, does not need user programming, reduces the manpower development cost and the learning cost, lowers the threshold of data production, is convenient to modify the entity normalization strategy, improves the efficiency of entity normalization processing, and can be applied to entity normalization processing of data in any field.
According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.
Fig. 4 is a block diagram of an electronic device according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital assistants, cellular telephones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.
As shown in fig. 4, the electronic apparatus includes: one or more processors 401, memory 402, and interfaces for connecting the various components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 4, one processor 401 is taken as an example.
Memory 402 is a non-transitory computer readable storage medium as provided herein. The memory stores instructions executable by at least one processor to cause the at least one processor to perform the entity normalization processing method provided by the present application. The non-transitory computer-readable storage medium of the present application stores computer instructions for causing a computer to execute the entity normalization processing method provided by the present application.
The memory 402, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules (e.g., the input module 301, the processing module 302, and the execution module 303 shown in fig. 3) corresponding to the entity normalization processing method in the embodiment of the present application. The processor 401 executes various functional applications of the server and data processing by running non-transitory software programs, instructions and modules stored in the memory 402, that is, implements the entity normalization processing method in the above method embodiment.
The memory 402 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the electronic device of the entity normalization processing method, and the like. Further, the memory 402 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 402 may optionally include memory located remotely from the processor 401, and such remote memory may be connected over a network to an electronic device that is capable of implementing the normalization processing method. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The electronic device of the entity normalization processing method may further include: an input device 403 and an output device 404. The processor 401, the memory 402, the input device 403 and the output device 404 may be connected by a bus or other means, and fig. 4 illustrates an example of a connection by a bus.
The input device 403 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic device of the entity normalization processing method, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointing stick, one or more mouse buttons, a track ball, a joystick, and the like. The output devices 404 may include a display device, auxiliary lighting devices (e.g., LEDs), and haptic feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.
Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
According to the technical scheme of the embodiment of the application, the rule parameters related to the entity normalization strategy input by a user are received; generating a program code corresponding to the entity normalization strategy according to the rule parameters and a preset code generation rule; and running a program code corresponding to the entity normalization strategy, and carrying out normalization judgment on entities in a preset entity data set so as to cluster the same entities. In the embodiment, a user only needs to input the rule parameters related to the entity normalization strategy, can automatically generate the program codes corresponding to the entity normalization strategy, does not need user programming, reduces the manpower development cost and the learning cost, lowers the threshold of data production, is convenient to modify the entity normalization strategy, improves the efficiency of entity normalization processing, and can be applied to entity normalization processing of data in any field.
The present application also provides a computer program comprising a program code for performing the entity normalization processing method according to the above embodiment when the computer program is run by a computer.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.
The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (17)

1. An entity normalization processing method is characterized by comprising the following steps:
receiving a rule parameter related to an entity normalization strategy input by a user;
generating a program code corresponding to the entity normalization strategy according to the rule parameters and a preset code generation rule;
and running a program code corresponding to the entity normalization strategy, and carrying out normalization judgment on entities in a preset entity data set so as to cluster the same entities.
2. The method according to claim 1, wherein the rule parameters include at least one target attribute to be compared, a comparison condition parameter corresponding to the target attribute, and a comparison rule combined between comparison conditions corresponding to the target attributes.
3. The method according to claim 2, wherein the generating a program code corresponding to the entity normalization policy according to the rule parameter and a preset code generation rule includes:
aiming at any target attribute to be compared, acquiring a comparison function of the target attribute according to the type of the target attribute and a comparison condition parameter corresponding to the target attribute;
calling a corresponding comparison function and determining a logic operation type according to each comparison rule to obtain a program code of the comparison rule;
and obtaining a program code corresponding to the entity normalization strategy according to the program code of each comparison rule.
4. The method according to claim 3, wherein the comparison condition parameters corresponding to the target attribute comprise a type of the target attribute, a comparison condition corresponding to the target attribute, and a strictness degree of a comparison process.
5. The method according to claim 4, wherein the obtaining a comparison function of the target attribute according to the type of the target attribute and the comparison condition parameter corresponding to the target attribute comprises:
determining a comparison method parameter in the comparison function according to the type of the target attribute;
determining a multi-valued comparison condition parameter in the comparison function according to the strictness degree of the comparison process, wherein the multi-valued comparison condition parameter comprises: the multiple values are identical, at least one identical, or completely different;
determining supplementary parameters in the comparison function according to the comparison condition and/or a preset data cleaning instruction;
and obtaining a comparison function of the target attribute according to the target attribute, the comparison method parameter, the multi-valued comparison condition parameter and the supplementary parameter.
6. The method according to claim 3, wherein obtaining the program code corresponding to the entity normalization policy according to the program code of each comparison rule comprises:
and receiving a priority order of the comparison rules set by a user, setting priorities for the program codes of the comparison rules according to the priority order of the comparison rules, and operating the program codes of the comparison rules according to the priorities when operating the program codes corresponding to the entity normalization strategy.
7. The method according to claim 1, wherein the running of the program code corresponding to the entity normalization policy further comprises:
receiving a starting instruction of a user, and operating a program code corresponding to the entity normalization strategy according to the operating instruction; and/or
Receiving a stopping instruction of a user, and stopping running a program code corresponding to the entity normalization strategy according to the stopping instruction;
after clustering the same entity, the method further comprises:
and receiving a viewing result instruction of a user, and displaying the clustering result according to the viewing result instruction.
8. An entity normalization processing apparatus, comprising:
the input module is used for receiving the rule parameters related to the entity normalization strategy input by a user;
the processing module is used for generating a program code corresponding to the entity normalization strategy according to the rule parameters and a preset code generation rule;
and the operation module is used for operating the program code corresponding to the entity normalization strategy and carrying out normalization judgment on the entities in the preset entity data set so as to cluster the same entities.
9. The apparatus of claim 8, wherein the rule parameters include at least one target attribute to be compared, a comparison condition parameter corresponding to the target attribute, and a comparison rule combined between comparison conditions corresponding to the target attributes.
10. The apparatus of claim 9, wherein the processing module is configured to:
aiming at any target attribute to be compared, acquiring a comparison function of the target attribute according to the type of the target attribute and a comparison condition parameter corresponding to the target attribute;
calling a corresponding comparison function and determining a logic operation type according to each comparison rule to obtain a program code of the comparison rule;
and obtaining a program code corresponding to the entity normalization strategy according to the program code of each comparison rule.
11. The apparatus according to claim 10, wherein the comparison condition parameters corresponding to the target attribute comprise a type of the target attribute, a comparison condition corresponding to the target attribute, and a strictness degree of a comparison process.
12. The apparatus of claim 11, wherein the processing module is configured to:
determining a comparison device parameter in the comparison function according to the type of the target attribute;
determining a multi-valued comparison condition parameter in the comparison function according to the strictness degree of the comparison process, wherein the multi-valued comparison condition parameter comprises: the multiple values are identical, at least one identical, or completely different;
determining supplementary parameters in the comparison function according to the comparison condition and/or a preset data cleaning instruction;
and obtaining a comparison function of the target attribute according to the target attribute, the comparison device parameter, the multi-valued comparison condition parameter and the supplementary parameter.
13. The apparatus of claim 10, wherein the processing module is configured to:
and receiving a priority order of the comparison rules set by a user, setting priorities for the program codes of the comparison rules according to the priority order of the comparison rules, and operating the program codes of the comparison rules according to the priorities when operating the program codes corresponding to the entity normalization strategy.
14. The apparatus of claim 8,
the input module is also used for receiving a starting instruction of a user;
the operation module is further used for operating the program code corresponding to the entity normalization strategy according to the operation instruction; and/or
The input module is also used for receiving a stop instruction of a user;
the running module is further used for stopping running the program code corresponding to the entity normalization strategy according to the stopping instruction;
the input module is also used for receiving a result viewing instruction of a user;
the operation module is further used for displaying the clustering result according to the checking result instruction.
15. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.
16. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-7.
17. An entity normalization processing method is characterized by comprising the following steps:
receiving a rule parameter related to an entity normalization strategy input by a user;
acquiring an entity normalization strategy according to the rule parameters;
and carrying out normalization judgment on the entities in the preset entity data set according to the entity normalization strategy, and outputting a normalization judgment result.
CN201911379440.1A 2019-12-27 2019-12-27 Entity normalization processing method, device, equipment and storage medium Active CN111158666B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911379440.1A CN111158666B (en) 2019-12-27 2019-12-27 Entity normalization processing method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911379440.1A CN111158666B (en) 2019-12-27 2019-12-27 Entity normalization processing method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111158666A true CN111158666A (en) 2020-05-15
CN111158666B CN111158666B (en) 2023-07-04

Family

ID=70558565

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911379440.1A Active CN111158666B (en) 2019-12-27 2019-12-27 Entity normalization processing method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111158666B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112910923A (en) * 2021-03-04 2021-06-04 麦荣章 Intelligent financial big data processing system
CN113190670A (en) * 2021-05-08 2021-07-30 重庆第二师范学院 Information display method and system based on big data platform
CN113295842A (en) * 2021-04-08 2021-08-24 湖南科技大学 Accurate evaluation system of mine side slope rock mass engineering stability
CN114167198A (en) * 2021-10-18 2022-03-11 国网山东省电力公司平原县供电公司 Method and platform for measuring synchronous line loss data

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101286156A (en) * 2007-05-29 2008-10-15 北大方正集团有限公司 Method for removing repeated object based on metadata
US20140040182A1 (en) * 2008-08-26 2014-02-06 Zeewise, Inc. Systems and methods for collection and consolidation of heterogeneous remote business data using dynamic data handling
CN104050162A (en) * 2013-03-11 2014-09-17 富士通株式会社 Data processing method and data processing device
US20160132103A1 (en) * 2014-11-11 2016-05-12 Intel Corporation User input via elastic deformation of a material
CN107562859A (en) * 2017-08-29 2018-01-09 武汉斗鱼网络科技有限公司 A kind of disaggregated model training system and its implementation
CN107632842A (en) * 2017-09-26 2018-01-26 携程旅游信息技术(上海)有限公司 Rule configuration and dissemination method, system, equipment and storage medium
CN108469977A (en) * 2018-03-26 2018-08-31 张�林 A kind of interface data management method
CN108804093A (en) * 2018-06-15 2018-11-13 联想(北京)有限公司 A kind of code generating method and electronic equipment
CN109582837A (en) * 2018-11-30 2019-04-05 长城计算机软件与***有限公司 A kind of visualized data processing method based on cloud and system

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101286156A (en) * 2007-05-29 2008-10-15 北大方正集团有限公司 Method for removing repeated object based on metadata
US20140040182A1 (en) * 2008-08-26 2014-02-06 Zeewise, Inc. Systems and methods for collection and consolidation of heterogeneous remote business data using dynamic data handling
CN104050162A (en) * 2013-03-11 2014-09-17 富士通株式会社 Data processing method and data processing device
US20160132103A1 (en) * 2014-11-11 2016-05-12 Intel Corporation User input via elastic deformation of a material
CN107562859A (en) * 2017-08-29 2018-01-09 武汉斗鱼网络科技有限公司 A kind of disaggregated model training system and its implementation
CN107632842A (en) * 2017-09-26 2018-01-26 携程旅游信息技术(上海)有限公司 Rule configuration and dissemination method, system, equipment and storage medium
CN108469977A (en) * 2018-03-26 2018-08-31 张�林 A kind of interface data management method
CN108804093A (en) * 2018-06-15 2018-11-13 联想(北京)有限公司 A kind of code generating method and electronic equipment
CN109582837A (en) * 2018-11-30 2019-04-05 长城计算机软件与***有限公司 A kind of visualized data processing method based on cloud and system

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
JESÚS DANIEL TRIGO: ""An Integrated Healthcare Information System for End-to-End Standardized Exchange and Homogeneous Management of Digital ECG Formats"", 《 IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE 》, vol. 16, no. 4, pages 518 - 529, XP011449226, DOI: 10.1109/TITB.2012.2191296 *
吉艳: ""基于变值测量的心电数据序列可视化应用研究"", 《中国优秀硕士学位论文全文数据库 医药卫生科技辑》, no. 2017, pages 062 - 34 *
驻守、流年: ""python学习笔记(十六)可视化操作界面Tkinter"", pages 1 - 11, Retrieved from the Internet <URL:《https://blog.csdn.net/qq_38830964/article/details/98665317》> *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112910923A (en) * 2021-03-04 2021-06-04 麦荣章 Intelligent financial big data processing system
CN113295842A (en) * 2021-04-08 2021-08-24 湖南科技大学 Accurate evaluation system of mine side slope rock mass engineering stability
CN113190670A (en) * 2021-05-08 2021-07-30 重庆第二师范学院 Information display method and system based on big data platform
CN114167198A (en) * 2021-10-18 2022-03-11 国网山东省电力公司平原县供电公司 Method and platform for measuring synchronous line loss data
CN114167198B (en) * 2021-10-18 2024-03-01 国网山东省电力公司平原县供电公司 Method and platform for measuring synchronous line loss data

Also Published As

Publication number Publication date
CN111158666B (en) 2023-07-04

Similar Documents

Publication Publication Date Title
CN111310934B (en) Model generation method and device, electronic equipment and storage medium
CN111158666B (en) Entity normalization processing method, device, equipment and storage medium
CN110806923B (en) Parallel processing method and device for block chain tasks, electronic equipment and medium
US11928432B2 (en) Multi-modal pre-training model acquisition method, electronic device and storage medium
JP2021082308A (en) Multimodal content processing method, apparatus, device and storage medium
JP7269913B2 (en) Knowledge graph construction method, device, electronic device, storage medium and computer program
US11573992B2 (en) Method, electronic device, and storage medium for generating relationship of events
CN110532487B (en) Label generation method and device
US20210398022A1 (en) Method and apparatus of fusing operators, electronic device and storage medium
CN112561332B (en) Model management method, device, electronic equipment, storage medium and program product
CN111680517A (en) Method, apparatus, device and storage medium for training a model
CN111061743B (en) Data processing method and device and electronic equipment
CN112528608B (en) Page editing method, page editing device, electronic equipment and storage medium
CN112016524B (en) Model training method, face recognition device, equipment and medium
CN111125451B (en) Data production processing method and device, electronic equipment and storage medium
CN112580723A (en) Multi-model fusion method and device, electronic equipment and storage medium
CN111177479A (en) Method and device for acquiring feature vectors of nodes in relational network graph
CN110909390A (en) Task auditing method and device, electronic equipment and storage medium
CN111783872B (en) Method, device, electronic equipment and computer readable storage medium for training model
US20220121963A1 (en) Network operator processing method, apparatus, electronic device and storage medium
EP3958183A1 (en) Deep learning model adaptation method and apparatus and electronic device
CN111680508B (en) Text processing method and device
CN111325006B (en) Information interaction method and device, electronic equipment and storage medium
CN111340976B (en) Method and device for debugging automatic driving vehicle module and electronic equipment
CN114661274A (en) Method and device for generating intelligent contract

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant