CN114782142A

CN114782142A - Commodity information matching method and device, equipment, medium and product thereof

Info

Publication number: CN114782142A
Application number: CN202210502150.7A
Authority: CN
Inventors: 吴智东
Original assignee: Guangzhou Huanju Shidai Information Technology Co Ltd
Current assignee: Guangzhou Huanju Shidai Information Technology Co Ltd
Priority date: 2022-05-09
Filing date: 2022-05-09
Publication date: 2022-07-22

Abstract

The application relates to a commodity information matching method, a device, equipment, a medium and a product thereof, wherein the method comprises the following steps: acquiring commodity information of a commodity entity to be detected, wherein the commodity information comprises a plurality of description texts, and extracting knowledge subgraphs of the commodity information; retrieving a plurality of commodity entities of which the knowledge subgraphs are matched with the knowledge subgraphs of the commodity entities to be detected from the knowledge graph spectrum to form a candidate commodity set; acquiring a sample set corresponding to each commodity entity in the candidate commodity set; and carrying out semantic matching on the commodity information of the commodity entity to be detected and the sample set of the commodity entities in the candidate commodity set one by one, and determining the commodity entity which is matched with the commodity entity to be detected. The technical scheme is suitable for a commodity infringement detection scene, and when whether the to-be-detected commodity entity forms infringement to some commodity entities needs to be judged, the further checking is carried out on the basis of the checking, so that the missing detection can be avoided through the effect exerted by the attribute contact degree, and the infringement commodity can be accurately identified.

Description

Commodity information matching method and device, equipment, medium and product thereof

Technical Field

The present application relates to the field of e-commerce information technologies, and in particular, to a method for matching merchandise information, and a corresponding apparatus, computer device, computer-readable storage medium, and computer program product.

Background

With the increasing maturity of internet technology, the e-commerce industry is rapidly developing, and a large number of merchants choose to sell commodities through e-commerce channels. The commodities are various in types and are different in quality, and a plurality of merchants can sell infringement commodities. The act of selling infringing products seriously impairs the legitimate interests of the original brand. Therefore, the method is an effective means for maintaining the proper interests of the brand owners to detect the commodities with infringement from a large quantity of commodities.

Although the e-commerce platform has a plurality of methods for detecting whether the goods sold by the merchants are infringing goods, the infringing goods cannot be completely detected due to various evasive means of the merchants. In the existing infringement detection method, a brand word bank is maintained and is used for fuzzy matching of commodities. However, this method cannot detect a commodity without a brand word in the text, resulting in a large number of missed detections. For example, some vendors may intentionally remove the brand name of the product, leaving only the product description, model number, etc. to circumvent the platform's detection of infringing products. Meanwhile, many brand names are also common phrases, and other non-brand commodity texts can also appear, so that the traditional detection method can cause false detection on the condition.

Disclosure of Invention

An object of the present application is to provide a method for matching merchandise information, and a corresponding apparatus, computer device, computer readable storage medium, computer program product, adapted to solve at least one of the above problems, and to adopt the following technical solutions:

in one aspect, a method for matching merchandise information is provided, which comprises the following steps:

acquiring commodity information of a commodity entity to be detected, wherein the commodity information comprises a plurality of description texts, and extracting a knowledge subgraph of the commodity information, wherein the knowledge subgraph comprises mapping relation data between attributes and attribute values extracted from the description texts;

retrieving a plurality of commodity entities with knowledge subgraphs matched with the knowledge subgraphs of the commodity entities to be detected from a knowledge graph spectrum to form a candidate commodity set, wherein the knowledge graph stores the knowledge subgraphs corresponding to the commodity entities in a commodity database;

acquiring a sample set corresponding to each commodity entity in a candidate commodity set, wherein each attribute in a knowledge subgraph corresponding to the sample set comprises one or more description texts, and the description texts in all the sample sets are organized according to a unified sequence;

and performing semantic matching on the commodity information of the commodity entity to be detected and the sample set of the commodity entities in the candidate commodity set one by one, and determining a matched commodity entity formed by the commodity entity to be detected.

In a deepened part of embodiments, the method for acquiring the commodity information of the commodity entity to be detected comprises the following steps:

responding to a commodity release request of an online shop, and acquiring commodity information of a to-be-detected commodity entity corresponding to the request, wherein the commodity information comprises any one or more items of a commodity title, commodity details and a description text corresponding to commodity attribute data of the commodity entity;

extracting attributes of the commodity information of the commodity entity to be detected so as to extract corresponding attribute values from each description text according to different attributes to form mapping relation data between the attributes and the attribute values;

and constructing mapping relation data between the attributes and the attribute values of the commodity entity to be detected into corresponding knowledge subgraphs according to a preset knowledge graph structure.

In a deepened part of embodiments, before the step of obtaining the commodity information of the commodity entity to be detected, the method comprises the following steps:

and constructing a knowledge graph corresponding to a commodity database, wherein the knowledge graph comprises knowledge subgraphs corresponding to all commodity entities in the commodity database, and the knowledge subgraphs comprise mapping relation data between attributes and attribute values extracted from description texts of commodity information of the corresponding commodity entities.

In a deepened part of embodiments, the construction of the knowledge graph corresponding to the commodity database comprises the following steps:

creating a knowledge graph, and acquiring commodity information of each commodity entity in a commodity database;

extracting the attributes of the commodity information of each commodity entity to extract corresponding attribute values from each description text according to different attributes to form mapping relation data between the attributes and the attribute values;

constructing mapping relation data between the attribute and the attribute value of each commodity entity into a knowledge subgraph of the corresponding commodity entity according to a preset structure of the knowledge graph;

different description versions of the commodity information of each commodity entity are obtained, each description version comprises a description text matched with the mapping relation data between each attribute and the attribute value of the commodity entity, samples corresponding to each description version are constructed according to a unified sequence, all samples form a sample set of the corresponding commodity entity, and the sample set is stored in a knowledge subgraph of the corresponding commodity entity.

In a deepened part of embodiments, a plurality of commodity entities with knowledge subgraphs matched with the knowledge subgraphs of the commodity entities to be detected are retrieved from a knowledge graph spectrum to form a candidate commodity set, and the method comprises the following steps:

acquiring mapping relation data between attributes and attribute values in the knowledge subgraph of the entity to be detected as an attribute set of the entity to be detected;

performing coincidence degree matching calculation on the attribute set of the commodity entity to be detected and the attribute set corresponding to each commodity entity in the knowledge graph, and determining the attribute coincidence degree between each commodity entity in the knowledge graph and the commodity entity to be detected;

and combining the commodity entity set with the attribute contact ratio meeting the preset conditions in the knowledge graph into a candidate commodity set.

In a deepened part of embodiments, the commodity information of the commodity entity to be detected is subjected to semantic matching with the sample set of the commodity entities in the candidate commodity set one by one, and the commodity entity matched with the commodity entity to be detected is determined from the semantic matching, which comprises the following steps:

calculating a first document similarity between the commodity information of the commodity entity to be detected and a sample set of the commodity entities in the candidate commodity set, taking the corresponding attribute contact degree of the commodity entity as the weight of the first document similarity to obtain a weighted similarity, and screening out the commodity entities with the weighted similarity meeting a preset condition to form a first commodity set;

calculating semantic similarity between the semantic vector of the commodity information of the commodity entity to be detected and the semantic vector of the sample set of the commodity entities in the first commodity set, and screening out commodity entities with semantic similarity meeting preset conditions to form a second commodity set;

and pushing the commodity entities in the second commodity set to terminal equipment for providing commodity information of the commodity entities to be detected as commodity entities matched with the commodity entities to be detected.

In a deepened embodiment, before the step of calculating the semantic similarity between the semantic vector of the commodity information of the commodity entity to be detected and the semantic vector of the sample set of the commodity entities in the first commodity set, the method comprises the following steps:

performing iterative training on a preset text feature extraction model to a convergence state by using a sample set of commodity entities in a knowledge graph as a training sample;

and respectively extracting a sample set of each commodity entity in the first commodity set and a semantic vector of commodity information of the commodity entity to be detected by adopting the text feature extraction model.

In accordance with another aspect of the present invention, there is provided a product information matching apparatus including:

the information extraction module is used for acquiring commodity information of a commodity entity to be detected, wherein the commodity information comprises a plurality of description texts and extracting a knowledge subgraph thereof, and the knowledge subgraph comprises mapping relation data between attributes and attribute values extracted from the description texts;

the retrieval matching module is used for retrieving a knowledge subgraph from a knowledge graph spectrum to form a candidate commodity set with a plurality of commodity entities matched with the knowledge subgraph of the commodity entity to be detected, and the knowledge graph stores the knowledge subgraphs corresponding to the commodity entities in a commodity database;

the extraction and sequencing module is used for acquiring a sample set corresponding to each commodity entity in the candidate commodity set, each attribute in the knowledge subgraph corresponding to the sample set comprises one or more description texts, and all the description texts in the sample set are organized according to a unified sequence;

and the semantic matching module is used for performing semantic matching on the commodity information of the commodity entity to be detected and the sample set of the commodity entities in the candidate commodity set one by one, and determining the commodity entity matched with the commodity entity to be detected.

In some embodiments of the deepening, the information extracting module includes: the information extraction unit is used for responding to a commodity release request of an online shop and acquiring commodity information of a to-be-detected commodity entity corresponding to the request, wherein the commodity information comprises any one or more items of a commodity title, commodity details and a description text corresponding to commodity attribute data of the commodity entity; the attribute extraction unit is used for extracting the attributes of the commodity information of the commodity entity to be detected so as to extract corresponding attribute values from the description texts according to different attributes to form mapping relation data between the attributes and the attribute values; and the subgraph construction unit is used for constructing the mapping relation data between the attributes and the attribute values of the commodity entity to be detected into the corresponding knowledge subgraph according to the preset knowledge graph structure.

In some embodiments of the deepening, the commodity information matching device further includes: the map construction module is used for constructing a knowledge map corresponding to the commodity database, the knowledge map comprises knowledge subgraphs corresponding to each commodity entity in the commodity database, and the knowledge subgraphs comprise mapping relation data between attributes and attribute values extracted from description texts of commodity information of the corresponding commodity entities.

In some embodiments of the deepening, the map building module includes: the map creating unit is used for creating a knowledge map and acquiring commodity information of each commodity entity in the commodity database; the mapping construction unit is used for extracting the attributes of the commodity information of each commodity entity so as to extract corresponding attribute values from each description text according to different attributes to form mapping relation data between the attributes and the attribute values; the subgraph creating unit is used for constructing the mapping relation data between the attribute and the attribute value of each commodity entity into a knowledge subgraph of the corresponding commodity entity according to the preset structure of the knowledge graph; and the sub-image matching unit is used for acquiring different description versions of the commodity information of each commodity entity, wherein each description version comprises a description text matched with the mapping relation data between each attribute and the attribute value of the commodity entity, constructing samples corresponding to each description version according to a unified sequence, and forming a sample set of the corresponding commodity entity by all the samples and storing the sample set into a knowledge sub-image of the corresponding commodity entity.

In some embodiments of the deepening, the retrieving matching module includes: the mapping merging unit is used for acquiring mapping relation data between the attributes and the attribute values in the knowledge subgraph of the commodity entity to be detected as an attribute set of the commodity entity to be detected; the coincidence calculation unit is used for performing coincidence degree matching calculation on the attribute set of the commodity entity to be detected and the attribute set corresponding to each commodity entity in the knowledge graph, and determining the attribute coincidence degree between each commodity entity in the knowledge graph and the commodity entity to be detected; and the candidate screening unit is used for collecting the commodity entity set with the attribute contact ratio meeting the preset conditions in the knowledge graph as a candidate commodity set.

In some embodiments of the deepening, the semantic matching module includes: the first screening unit is used for calculating a first document similarity between the commodity information of the commodity entity to be detected and a sample set of the commodity entities in the candidate commodity set, taking the corresponding attribute contact degree of the commodity entity as the weight of the first document similarity to obtain a weighted similarity, and screening out the commodity entities with the weighted similarity meeting a preset condition to form a first commodity set; the second screening unit is used for calculating the semantic similarity between the semantic vector of the commodity information of the commodity entity to be detected and the semantic vector of the sample set of the commodity entities in the first commodity set, and screening out the commodity entities with the semantic similarity meeting the preset conditions to form a second commodity set; and the commodity pushing unit is used for pushing the commodity entities concentrated by the second commodities as commodity entities matched with the commodity entities to be detected to the terminal equipment for providing the commodity information of the commodity entities to be detected.

In some embodiments of the deepening, the semantic matching module further includes: the model training unit is used for performing iterative training on a preset text feature extraction model to a convergence state by using a sample set of commodity entities in the knowledge graph as a training sample; and the vector extraction unit is used for respectively extracting the sample set of each commodity entity in the first commodity set and the semantic vector of the commodity information of the commodity entity to be detected by adopting the text feature extraction model.

In yet another aspect, a computer device adapted to one of the objects of the present application is provided, which includes a central processing unit and a memory, wherein the central processing unit is used for invoking and running a computer program stored in the memory to execute the steps of the merchandise information matching method described in the present application.

In still another aspect, a computer-readable storage medium storing a computer program implemented according to the method for matching merchandise information in the form of computer-readable instructions, the computer program being invoked by a computer to execute the steps included in the method.

In a further aspect, a computer program product is provided, which comprises computer program/instructions, which when executed by a processor, implement the steps of the method as described in any one of the embodiments of the present application.

Compared with the prior art, the application has various advantages, including at least the following aspects:

firstly, according to the matching scene of the e-commerce detection, extracting knowledge subgraphs corresponding to the commodity entity to be detected, calculating the attribute coincidence degree of the mapping relation data between the attributes and the attribute values in the knowledge subgraphs of the commodity entity to be detected and the mapping relation data between the attributes and the attribute values of the knowledge subgraphs of the commodity entity to be detected, screening a plurality of knowledge subgraphs similar to the knowledge subgraphs of the commodity entity to be detected from the knowledge graphs, forming a candidate commodity set by the plurality of knowledge subgraphs, recalling the description text corresponding to the mapping relation data between the attributes and the attribute values of the knowledge subgraphs in the candidate commodity set, correspondingly extracting the semantic vector of the description text and the semantic vector of the commodity information of the commodity entity to be detected, and then semantically matching the semantic vector of the commodity entity to be detected and the semantic vector of the commodity entity in the candidate commodity set, so as to screen out the commodity entity which is the same as or similar to the commodity entity to be detected from the candidate commodity set. Based on the method, the commodities which are the same as or similar to the commodities to be detected can be screened from the commodity database efficiently.

Secondly, recalling is achieved by utilizing the contact ratio between the commodity entity to be detected and the attributes of each commodity entity in the knowledge graph, the function of retrieval and recall is achieved, and a candidate commodity set is obtained; and then, carrying out semantic matching on the commodity information of the commodity entity to be detected and the sample set of each commodity entity in the candidate commodity set, and realizing precise sequencing of each commodity entity in the candidate commodity set, thereby screening out the commodity entity matched with the commodity entity to be detected. Because each commodity entity in the candidate commodity set comprises a sample set corresponding to the commodity entity in advance, each attribute of the sample set corresponding to the commodity entity comprises one or more description texts, and the description texts are representations of the same commodity entity in different forms, the reference range of the commodity information of the commodity entity is expanded, and richer reference information is provided for semantic matching, so that a more accurate matching result can be obtained during semantic matching, and the retrieval accuracy is ensured.

In addition, the technical scheme of the application is suitable for a commodity infringement detection scene, when the condition that whether the commodity entity to be detected forms infringement to some historical commodity entities needs to be judged, the condition that the detection is missed can be avoided through the effect exerted by the attribute overlap ratio through further checking on the basis of checking, the problem that the commodity information of the commodity entity to be detected is partially deficient can be solved by means of the positive effect exerted by the reference information provided by the sample set, and infringement commodities can be accurately identified.

Drawings

The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

fig. 1 is a schematic flowchart of an exemplary embodiment of a product information matching method according to the present application.

Fig. 2 is a schematic flow chart illustrating a process of acquiring commodity information of a commodity entity to be detected in an embodiment of the present application.

Fig. 3 is a schematic flow chart of a process of constructing a knowledge sub-graph for a commodity entity to be detected in an embodiment of the present application.

Fig. 4 is a flowchart illustrating a process of constructing a knowledge graph corresponding to a commodity database according to an embodiment of the present application.

Fig. 5 is a flowchart illustrating a process of acquiring a candidate product set according to an embodiment of the present application.

Fig. 6 is a flowchart illustrating a process of determining a commodity entity that matches a to-be-detected commodity entity in an embodiment of the present application.

Fig. 7 is a flowchart illustrating a process of extracting a semantic vector according to an embodiment of the present application.

Fig. 8 is a functional block diagram of the commodity information matching apparatus of the present application;

fig. 9 is a schematic structural diagram of a computer device used in the present application.

Detailed Description

Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present application.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.

It will be understood by those within the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

As used herein, "client," "terminal," and "terminal device" include both wireless signal receiver devices, which are only capable of wireless signal receiver devices without transmit capability, and receiving and transmitting hardware devices, which have receiving and transmitting hardware capable of two-way communication over a two-way communication link, as will be understood by those skilled in the art. Such a device may include: cellular or other communication devices such as personal computers, tablets, etc. having single or multi-line displays or cellular or other communication devices without multi-line displays; PCS (Personal Communications Service), which may combine voice, data processing, facsimile and/or data communication capabilities; a PDA (Personal Digital Assistant) that may include a radio frequency receiver, a pager, internet/intranet access, web browser, notepad, calendar, and/or GPS (Global Positioning System) receiver; a conventional laptop and/or palmtop computer or other appliance having and/or including a radio frequency receiver. As used herein, a "client," "terminal device" can be portable, transportable, installed in a vehicle (aeronautical, maritime, and/or land-based), or situated and/or configured to operate locally and/or in a distributed fashion at any other location(s) on earth and/or in space. The "client", "terminal Device" used herein may also be a communication terminal, a Internet access terminal, and a music/video playing terminal, and may be, for example, a PDA, an MID (Mobile Internet Device), and/or a Mobile phone with music/video playing function, and may also be a smart television, a set-top box, and other devices.

The hardware referred to by the names "server", "client", "service node", etc. is essentially an electronic device with the performance of a personal computer, and is a hardware device having necessary components disclosed by the von neumann principle such as a central processing unit (including an arithmetic unit and a controller), a memory, an input device, an output device, etc., a computer program is stored in the memory, and the central processing unit calls a program stored in an external memory into the internal memory to run, executes instructions in the program, and interacts with the input and output devices, thereby completing a specific function.

It should be noted that the concept of "server" in the present application can be extended to the case of server cluster. According to the network deployment principle understood by those skilled in the art, the servers should be logically divided, and in physical space, the servers may be independent from each other but can be called through an interface, or may be integrated into one physical computer or a set of computer clusters. Those skilled in the art should understand this variation and should not be so constrained as to implement the network deployment of the present application.

One or more technical features of the present application, unless expressly specified otherwise, may be deployed to a server for implementation by a client remotely invoking an online service interface provided by a capture server for access, or may be deployed directly and run on the client for access.

Unless specified in clear text, the neural network model referred to or possibly referred to in the application can be deployed in a remote server and used for remote call at a client, and can also be deployed in a client with qualified equipment capability for direct call.

Various data referred to in the present application may be stored in a server remotely or in a local terminal device unless specified in the clear text, as long as the data is suitable for being called by the technical solution of the present application.

The person skilled in the art will know this: although the various methods of the present application are described based on the same concept so as to be common to each other, they may be independently performed unless otherwise specified. In the same way, for each embodiment disclosed in the present application, the same inventive concept is proposed, and therefore, concepts expressed in the same manner and concepts expressed in terms of the same are equally understood, and even though the concepts are expressed differently, they are merely convenient and appropriately changed.

The embodiments to be disclosed herein can be flexibly constructed by cross-linking related technical features of the embodiments unless the mutual exclusion relationship between the related technical features is stated in the clear text, as long as the combination does not depart from the inventive spirit of the present application and can meet the needs of the prior art or solve the deficiencies of the prior art. Those skilled in the art will appreciate variations therefrom.

The commodity information matching method can be programmed into a computer program product and can be deployed in a client or a server to run, for example, in an exemplary application scenario of the present application, the commodity information matching method can be deployed and implemented in a server of an e-commerce platform, so that the method can be executed by accessing an open interface after the computer program product runs and performing human-computer interaction with a process of the computer program product through a graphical user interface.

Referring to fig. 1, in an exemplary embodiment of the method for matching commodity information of the present application, the method includes the following steps:

step S1100, acquiring commodity information of a commodity entity to be detected, wherein the commodity information comprises a plurality of description texts, and extracting a knowledge subgraph of the commodity information, wherein the knowledge subgraph comprises mapping relation data between attributes and attribute values extracted from the description texts;

the commodity information belongs to commodity description information corresponding to commodities issued by users. And defining the commodity released on the E-commerce platform according to the commodity information, thereby defining a commodity entity on a data layer so as to call the commodity information of the commodity. The commodity information comprises a description text and a commodity picture corresponding to the commodity. The article information generally includes a plurality of description texts, including but not limited to title texts, detail texts, article attribute data, and the like.

The goods entities to be detected need to be matched with the same or similar goods entities. In an exemplary E-commerce scenario, one or some merchant users have sales rights to a certain commodity, other merchants do not have sales rights to the commodity, and other merchant users submit commodity information of versions different from the original commodity for sale in order to obtain sales benefits, so that infringement detection of an E-commerce platform is avoided. The E-commerce platform carries out infringement commodity detection on commodities on shelves in order to protect the benefits of merchant users with commodity sales rights, and prevents merchant users without certain commodity sales rights from infringing the benefits of merchant users with commodity sales rights.

And the mapping relation between the attributes and the attribute values is used for performing word segmentation processing on the description text of the commodity entity, classifying the words according to the attributes through an attribute extraction model, and taking the words as the attribute values so as to obtain mapping data relation data between the attributes and the attribute values. For example, for a commodity title "a brand 2021 summer ultraviolet-proof ice-silk loose comfortable wind coat M15237", after performing word segmentation, a plurality of segmented words can be obtained, and the plurality of segmented words are classified according to attributes to form mapping relation data between the attributes and the attribute values, where the mapping relation data is expressed as follows:

("brand", "certain brand"), ("applicable season", "summer"), ("function", "anti-ultraviolet ray"), ("fabric", "ice silk"), ("type", "loose"), ("style", "wind coat"), ("type", "M15237")

Of course, the same description text is classified based on different mapping relationships, and the mapping relationship data between the obtained attributes and attribute values may be slightly different, which is naturally understood by those skilled in the art, and the scope covered by the inventive spirit of the present application should not be limited by the examples herein.

And after the attribute extraction is finished from the commodity information of the commodity entity to be detected and the mapping relation data between the attribute and the attribute value is obtained, establishing a knowledge subgraph corresponding to the commodity entity to be detected according to the structure of the knowledge graph of the application, and storing the mapping relation data in the knowledge subgraph.

S1200, retrieving a plurality of commodity entities with knowledge subgraphs matched with the knowledge subgraphs of the commodity entities to be detected from a knowledge graph spectrum to form a candidate commodity set, wherein the knowledge graph stores the knowledge subgraphs corresponding to the commodity entities in a commodity database;

the method comprises the steps of preparing a knowledge graph, wherein the knowledge graph represents each commodity entity in a commodity database and the connection relation between the attribute and the attribute value of each commodity entity in the commodity database in a graph structure mode. Each commodity entity is organized according to the graph structure, so that a knowledge subgraph can be correspondingly generated, and the knowledge subgraph can represent the commodity entities and the corresponding relation data between the attributes and the attribute values of the commodity entities.

After the server acquires the knowledge subgraph of the commodity entity to be detected, the knowledge subgraph with the attribute coincidence degree meeting the preset condition with the knowledge subgraph of the commodity entity to be detected is searched from the knowledge graph spectrum based on the knowledge subgraph of the commodity entity to be detected, namely, a plurality of commodity entities are obtained through searching and are used as candidate commodity entities to be constructed into a candidate commodity set.

Specifically, the server obtains mapping relationship data between a plurality of attributes and attribute values contained in the knowledge subgraph from the knowledge subgraph of the commodity entity to be detected and mapping relationship data between a plurality of attributes and attribute values contained in all the knowledge subgraphs from the knowledge graph, respectively calculates the number of coincidence between the mapping relationship data between the attributes and the attribute values of the knowledge subgraph of the commodity entity to be detected and the mapping relationship between the attributes and the attribute values of the knowledge subgraph in the knowledge graph, obtains the attribute coincidence degree, screens out the knowledge subgraph with the attribute coincidence degree meeting the preset condition from the knowledge graph, and takes the commodity entity set corresponding to the knowledge subgraph meeting the preset condition as a candidate commodity set. That is to say, the server screens out a plurality of commodity entities meeting preset conditions from the commodity attribute library by calculating the attribute overlap ratio, so that the data amount to be calculated and processed in the next step is reduced, and the data processing rate of the next step is improved. In the process of searching the matched commodity entity for the commodity entity to be detected, the coincidence relation between the attributes is utilized for determining, and the attribute data is relatively accurate data when the commodity is sold, so that the effectiveness of commodity recall can be ensured, and candidate commodity entities similar to the commodity entity to be detected in constitution can be obtained as much as possible in the recall stage.

S1300, acquiring a sample set corresponding to each commodity entity in the candidate commodity set, wherein each attribute in the knowledge subgraph corresponding to the sample set comprises one or more description texts, and the description texts in all the sample sets are organized according to a unified sequence;

and acquiring a knowledge subgraph corresponding to the commodity entity in the candidate commodity set, and recalling a description text corresponding to the mapping relation data between the attribute and the attribute value from a commodity database based on the mapping relation data between the attribute and the attribute value in the knowledge subgraph of the commodity entity.

Because the same commodity entity may have different versions of commodity information, the attribute values extracted from the different versions of commodity information by the same attribute may be different, that is, the same attribute value may correspond to multiple attribute values.

The server recalls a plurality of corresponding description texts from a plurality of versions of the commodity information of the same commodity based on the mapping relation data between one attribute and the attribute value in the knowledge subgraph of the commodity entity, wherein the plurality of description texts belong to different versions of the commodity information respectively.

The server organizes one or more description texts recalled from the commodity database based on the mapping relation between one attribute and the attribute value of the knowledge subgraph in the candidate commodity set according to the attribute organization sequence corresponding to the knowledge subgraph in which the mapping relation data between the attribute and the attribute value exists.

The server recalls one or more description texts corresponding to the mapping relation data between the attributes and the mapping relation data between the attributes from the commodity database respectively based on the mapping relation between all the attributes and the attribute values in a knowledge subgraph of the candidate commodity set, and organizes the description texts corresponding to the mapping relations between all the attributes and the attribute values in the knowledge subgraph according to the same attribute organization sequence to form a knowledge subgraph or commodity entity sample set.

It is easy to understand that by providing a sample set for the same commodity entity and including different version description texts corresponding to the attributes of the commodity entity in the sample set, the reference information of the commodity entity can be enriched.

In one embodiment, the mapping relationship data between each commodity entity and its sample set may be stored independently. In another embodiment, the sample set of each commodity entity can also be stored as one of its attribute associations in the knowledge subgraph of the commodity entity.

S1400, carrying out semantic matching on the commodity information of the commodity entity to be detected and the sample set of the commodity entities in the candidate commodity set one by one, and determining a commodity entity matched with the commodity entity to be detected;

after receiving the commodity information of the commodity entity to be detected or the sample set of the commodity entity in the candidate commodity set, the server can call the pre-trained text feature extraction model to extract the deep information of the commodity entity to be detected or the sample set of the commodity entity in the candidate commodity set, so that the corresponding semantic vector is obtained.

The text feature extraction model is trained to a convergence state in advance and is used for carrying out expression learning based on deep semantics on the title text input into the text feature extraction model. The skilled person knows that the text feature extraction model can be put into the technical solution of the present application for use as long as sufficient samples are adopted to train the text feature model, so that the text feature extraction model is suitable for extracting deep semantic information from the commodity information of the commodity entity to be detected or the sample set of the commodity entity in the candidate commodity set. The text feature extraction model is generally a neural network model, and is preferably a model implemented based on Bert, so as to be more suitable for processing sequence information such as words. The following examples of the present application will further disclose preferred modes for model selection, which are not shown below.

After the server obtains the semantic vector of the commodity entity to be detected and the semantic vector of the commodity entity in the candidate commodity set, semantic matching is carried out between the semantic vector of the commodity entity to be detected and the semantic vector of the commodity entity in the candidate commodity set. Specifically, the similarity between the semantic vector of the commodity entity to be detected and the semantic vector of the commodity entity in the candidate commodity set is calculated to perform semantic matching, and the semantic matching can be realized in various ways. For example:

in one embodiment, after obtaining the semantic vector of the commodity entity to be detected and the semantic vectors of the commodity entities in the candidate commodity set, a preset data distance algorithm is adopted to calculate the data distance between the semantic vector of each commodity entity in the candidate commodity set and the semantic vector of the commodity entity to be detected. The data distance algorithm includes, but is not limited to, a cosine similarity algorithm, an euclidean distance algorithm, a pearson correlation coefficient algorithm, a minkowski distance algorithm, a mahalanobis distance algorithm, a jaccard coefficient algorithm, etc., and one skilled in the art may optionally determine any data distance algorithm to implement, as long as the distance between the data and the point can be calculated.

After the data distance is determined, for convenience of calculation, in an optional embodiment, the data distance may be normalized to a numerical value space of [0,1], so that the larger the numerical value is, the closer the data distance is, thereby the higher the possibility that the to-be-detected commodity entity is matched with the corresponding commodity entity in the candidate commodity set is.

In another embodiment, an index of the semantic vectors of the commodity entities in the candidate commodity set is constructed through an interface provided by a faiss framework, and further through a similarity calculation interface provided by the framework, the similarity data can be obtained by applying a corresponding preset algorithm, so that the similarity between the semantic vector of the commodity entity to be detected and the semantic vector of the commodity entity in the candidate commodity set is quickly calculated.

The algorithm for realizing the similarity between the semantic vector of the commodity entity to be detected and the semantic vector of the commodity entity in the candidate commodity set can be flexibly set. And when the similarity between the entity of the commodity to be detected and each entity of the candidate commodity set is calculated based on the semantic vector, screening the entities of the commodity meeting the preset matching condition in the candidate commodity set, and matching the entities of the commodity screened from the candidate commodity set with the entity of the commodity to be detected.

In the exemplary embodiment, the advantages of the present application are embodied, and at the server side, a sample set corresponding to each commodity entity is pre-constructed by structuring commodity information of each commodity entity, and reference information is enriched by concentrating description texts corresponding to mapping relationships between each attribute and an attribute value. And then, when judging whether the candidate commodity set has a commodity entity which is the same as the commodity entity to be detected, matching by adopting the semantic vector acquired from the sample set, so that the matching accuracy is greatly improved, and the missing rate is reduced.

Furthermore, the server obtains the semantic vector of the commodity entity to be detected and the semantic vector of the commodity entity in the candidate commodity set, semantic matching is carried out according to the semantic vector of the commodity entity to be detected, the similarity between the semantic vector of the commodity entity to be detected and the semantic vector of the commodity entity in the candidate commodity set is calculated, the commodity entity matched with the commodity entity to be detected is determined from the candidate commodity set according to the similarity, the commodity entity identical with the commodity entity to be detected is screened out from a commodity database, the retrieval matching precision of the infringing commodities of the E-commerce platform is improved, the situation that the retrieval matching cannot be accurately retrieved just by means of keywords as in the traditional retrieval, and a large amount of missed detection or false detection is caused is avoided.

Compared with the prior art, the application has various advantages, at least comprising the following aspects:

firstly, according to the E-commerce detection matching scene, extracting knowledge subgraphs corresponding to the commodity entity to be detected, calculating the attribute overlap ratio of the mapping relation data between the attributes and the attribute values in the knowledge subgraphs of the commodity entity to be detected and the mapping relation data between the attributes and the attribute values of the knowledge subgraphs of the commodity entity in the knowledge graph, screening out a plurality of knowledge subgraphs similar to the knowledge subgraphs of the commodity entity to be detected from the knowledge graph, forming a candidate commodity set by the knowledge subgraphs, recalling the description text corresponding to the mapping relation data between the attributes and the attribute values of the knowledge subgraphs in the candidate commodity set, correspondingly extracting the semantic vector of the description text and the semantic vector of the commodity information of the commodity entity to be detected, and then semantically matching the semantic vector of the commodity entity to be detected and the semantic vector of the commodity entity in the candidate commodity set, so as to screen out the commodity entity which is the same as or similar to the commodity entity to be detected from the candidate commodity set. Based on the method, the commodities which are the same as or similar to the commodities to be detected can be efficiently screened from the commodity database.

Secondly, recalling is achieved by utilizing the contact ratio between the commodity entity to be detected and the attributes of each commodity entity in the knowledge graph, the function of retrieval and recall is achieved, and a candidate commodity set is obtained; and then, carrying out semantic matching on the commodity information of the commodity entity to be detected and the sample set of each commodity entity in the candidate commodity set to realize the precise ordering of each commodity entity in the candidate commodity set, thereby screening out the commodity entity matched with the commodity entity to be detected. In the candidate commodity set, each commodity entity comprises a sample set corresponding to the commodity entity in advance, each attribute of the sample set corresponding to the commodity entity comprises one or more description texts, and the description texts are representations of different forms of the same commodity entity, so that the reference range of the commodity information of the commodity entity is expanded, richer reference information is provided for semantic matching, a more accurate matching result can be obtained during semantic matching, and the retrieval accuracy is ensured.

Referring to fig. 2, in a further embodiment, the step S1100 of obtaining the commodity information of the commodity entity to be detected includes the following steps:

step S1110, responding to a merchandise release request from an online store, and acquiring merchandise information of a to-be-detected merchandise entity corresponding to the request, where the merchandise information includes any one or more items of a merchandise title, merchandise details, and a description text corresponding to merchandise attribute data of the merchandise entity:

the server responds to a commodity release request of an online shop, acquires commodity information of a commodity corresponding to the commodity release request, constructs the commodity information of the commodity as a commodity entity on a data layer, and takes the commodity entity as a to-be-detected commodity entity.

The server obtains various description texts of the commodity information from the commodity information based on the commodity information of the commodity entity to be detected, for example, a commodity title "a certain brand 2021 ultraviolet-proof ice silk loose and comfortable weather coat in summer M15237" which is one of the description texts is obtained from the commodity information.

Step S1120, performing attribute extraction on the commodity information of the commodity entity to be detected, so as to extract corresponding attribute values from each description text according to the attributes, and form mapping relationship data between the attributes and the attribute values:

in order to obtain the mapping relationship data between the attributes and the attribute values from the commodity information of the commodity entity, each description text needs to be segmented first. When the words are segmented, the words can be segmented by means of an attribute extraction model realized based on a neural network model. The attribute extraction model comprises a text feature extraction module and a conditional random field module, wherein the text feature extraction module is usually realized based on a transform underlying network architecture, and can be selected from basic network models such as Lattice LSTM and Bert suitable for processing sequence data for representation learning of description texts to obtain text feature vectors. And the conditional random field module, namely a CRF (conditional random field) model identifies the attributes of the described text based on the text feature vector so as to extract the attributes of the text. Since the techniques for implementing named entity recognition by these models are well known to those skilled in the art, they are not repeated herein. Of course, before the attribute extraction model is used in the present application, the attribute extraction model is trained to a convergence state in advance, so that the model learns the corresponding capability, the capability enables the model to perform representation learning according to the embedded vector obtained from the description text of the present application to obtain the text feature vector corresponding to the deep semantic information, the attribute recognition is realized on the basis of the text feature vector, and each attribute is obtained according to the attribute recognition.

The server identifies a plurality of obtained attributes based on the attribute extraction model, and can extract corresponding attribute values from a plurality of description texts of the commodity information of the corresponding commodity entity. And matching the obtained attribute values with the attributes so as to determine the attribute value and which attribute corresponding mapping is performed. And obtaining the identification result of the description text according to the mapping corresponding relation between the attribute and the attribute value. For example, for the commodity information "certain brand 2021 summer ultraviolet protection ice silk loose comfortable wind coat M15237" the attribute extraction result is: the novel fabric comprises a plurality of layers of fabric, a plurality of layers of wind-proof fabric, a plurality of layers of cloth, a plurality of layers of types of M15237, a plurality of types of cloth, a plurality of types, a plurality of types, a plurality of types, types of types, a type, a plurality of types, a type, a.

Step S1130, according to a preset knowledge map structure, constructing mapping relation data between attributes and attribute values of the to-be-detected commodity entity into corresponding knowledge sub-maps:

after acquiring the mapping relation data between a plurality of attributes and attribute values of the commodity entity to be detected, the server sequences the mapping relation data between the attributes and the attribute values, and constructs a knowledge subgraph of the commodity entity to be detected according to a data structure of a preset knowledge graph.

In the embodiment, the attribute value corresponding to the commodity information is accurately extracted through attribute extraction, so that a knowledge subgraph of the commodity entity to be detected is accurately constructed, matching of the same or similar commodity entities is subsequently performed according to the knowledge subgraph, and the attribute contact ratio is calculated on the basis of the attribute value of the knowledge subgraph, so that the candidate commodity entities as many as possible can be effectively recalled, and the purpose of retrieval and recall is realized.

Referring to fig. 3, in a further embodiment, before the step of obtaining the commodity information of the commodity entity to be detected, the method includes the following steps:

step S1000, constructing a knowledge graph corresponding to a commodity database, wherein the knowledge graph comprises knowledge subgraphs corresponding to all commodity entities in the commodity database, and the knowledge subgraphs comprise mapping relations between attributes and attribute values extracted from description texts of commodity information of the corresponding commodity entities:

the technical scheme of the application takes the operation environment of the e-commerce platform as the application environment, and the e-commerce platform can be an e-commerce service platform for opening independent site service, typically, for example, a cross-border e-commerce service platform. Such a platform allows an e-commerce platform to serve a large number of such independent sites by configuring each merchant's store as an individual independent site, due to the need to account for the network environment between regions around the world and the independence between merchants.

Each independent site is provided with a commodity database corresponding to commodities sold by the website, and the commodity data comprises commodity information of a large number of commodity entities. Accordingly, it can be understood that a large amount of commodity information of commodity entities corresponding to each individual site can be obtained by an access operation to the commodity database of each individual site.

In the application, a knowledge graph of a commodity database corresponding to the application environment is constructed according to the commodity database contained in the e-commerce platform or the independent site of the application environment where the commodity entity to be detected is located. The server constructs corresponding knowledge subgraphs for all or part of commodity entities in the commodity database, and based on the graph structure of the knowledge graph, a plurality of knowledge subgraphs in the knowledge graph share the nodes of the mapping relation data between the same attribute and the attribute value, so that the data volume of the knowledge graph is reduced, the storage space is saved, and the calling and the association of the data are facilitated.

Referring to fig. 4, in a detailed embodiment, the step S1000 of constructing a knowledge graph corresponding to a commodity database includes the following steps:

step S1010, creating a knowledge graph, and acquiring commodity information of each commodity entity in a commodity database:

and the server responds to the knowledge graph creating instruction, adapts to a commodity database contained in an e-commerce platform or an independent site of the application environment where the commodity entity to be detected is located, and retrieves the commodity information of the whole or part of the commodity entity from the corresponding database.

Step S1020, performing attribute extraction on the product information of each product entity to extract corresponding attribute values from each description text according to different attributes, and forming mapping relationship data between the attributes and the attribute values:

and based on the attribute extraction model, extracting attributes from the description text of the commodity information of each commodity entity in the corresponding database, extracting corresponding attribute values from the commodity information by using the attribute extraction model, and corresponding the attributes and the attribute values to obtain mapping relation data between the attributes and the attribute values of the corresponding commodity entities.

Step S1030, according to the preset structure of the knowledge graph, constructing the mapping relation data between the attribute and the attribute value of each commodity entity into a knowledge subgraph of the corresponding commodity entity:

the server acquires the mapping relation data between the attributes and the attribute values of the commodity entity, and organizes the mapping relation between the attributes and the attribute values of the commodity entity according to the graph structure of the knowledge graph so as to construct a knowledge subgraph of the commodity entity.

Step S1040, obtaining different description versions of the commodity information of each commodity entity, each description version including a description text matching with the mapping relation data between each attribute and attribute value of the commodity entity, constructing samples corresponding to each version according to a uniform sequence, constructing a sample set of the corresponding commodity entity by all samples, and storing the sample set in a knowledge subgraph of the corresponding commodity entity:

according to the technical scheme, the operating environment of an e-commerce platform is used as the application environment, commodity information of massive commodities is stored in a commodity database, and commodity information uploaded to the commodity database by different users aiming at the same commodity may be the same or different. The commodity database stores the data of the same version or different versions separately, so that the space resources of the commodity database are occupied repeatedly, and the waste of storage resources is caused. The server executes contact ratio detection on the commodity information uploaded by the user, and can align the commodity information uploaded by different users based on the same commodity in the same version or different versions so that the server can store a plurality of commodity information of the same commodity in the same position and the server can call the commodity information subsequently.

After the server obtains the commodity information of different versions of the same commodity entity, the mapping relation data between the attribute and the attribute value is extracted from the commodity information of each version. And the server recalls the corresponding description text from the corresponding version of the commodity information based on the mapping relation data between the attributes and the attribute values extracted from the commodity information of each version. Each version has a plurality of attribute and attribute value mapping relation data, therefore, each version can recall a plurality of description texts, and the plurality of description texts form a sample.

Because the same commodity entity has different versions of commodity information, different versions of commodity information may have different attribute values for the same attribute, that is, the same attribute of the same commodity entity may be mapped with multiple attribute values to form mapping relationship data between the attribute and the attribute value.

After obtaining the samples corresponding to the commodity information of each version of the same commodity entity, the server organizes the data of the description versions contained in the samples corresponding to each version of the same commodity entity according to the attributes, specifically, the description texts recalled by the same attribute are classified into one class, so that the description texts with the same attribute in each sample can be associated and called conveniently.

The same commodity entity has commodity information of different versions, the server obtains the commodity information of different versions of the same commodity entity, accurate portrait of the commodity entity can be conveniently drawn, the server can conveniently carry out similarity matching on the commodity entity to be screened and the commodity entity, and matching accuracy is improved.

Referring to fig. 5, in a further embodiment, the step S1200 of retrieving the candidate commodity set matched with the knowledge subgraph of the commodity entity to be retrieved from the knowledge graph spectrum includes the following steps:

step S1210, obtaining mapping relation data between the attributes and the attribute values in the knowledge subgraph of the to-be-detected commodity entity as an attribute set:

and the server calls the mapping relation data between the attributes and the attribute values in the knowledge subgraph of the entity to be detected, and stores the called mapping relations between the attributes and the attribute values in the same attribute set.

Step S1220, the attribute set of the commodity entity to be detected and the attribute set corresponding to each commodity entity in the knowledge graph are subjected to coincidence degree matching calculation, and the attribute coincidence degree between each commodity entity in the knowledge graph and the commodity entity to be detected is determined:

the server calls the mapping relations between the attributes and the attribute values of the knowledge sub-graphs from the knowledge graph spectrum respectively, and stores the mapping relations between the called attributes and the attribute values in the corresponding attribute sets, so that the corresponding number of attribute sets can be correspondingly obtained based on the knowledge sub-graphs or the commodity entities of the knowledge graph.

And performing attribute overlap ratio matching calculation on the attribute set of the commodity entity to be detected and the attribute set of each commodity entity in the knowledge graph to determine the attribute overlap ratio between each commodity entity in the knowledge graph and the commodity entity to be detected. As a measure between the commodity entity to be detected and the commodity entity in the knowledge graph, the example notations are as follows:

wherein S is_new,iSet, representing the degree of attribute overlap between the Set of attributes of the commodity entity to be detected and the Set of attributes of the commodity entity in the knowledge graph_new,iThe quantity, Set, of the coincidence of the mapping relation data between the attribute Set attributes and the attribute values of the commodity entity to be detected and the mapping relation data between the attribute Set attributes and the attribute values of the ith commodity entity in the knowledge graph_iAnd representing the quantity of mapping relation data between the attributes and the attribute values in the attribute set of the ith commodity of the knowledge graph.

For convenience of calculation, the value range of the attribute contact ratio is normalized to a numerical space of [0,1], the limits are 0 and 1 respectively, and the larger the value is, the closer the two are, the higher the attribute contact ratio is, otherwise, the smaller the value is, the larger the difference between the two is, the lower the attribute contact ratio is.

Step S1230, the commodity entity set whose attribute overlap ratio in the knowledge graph satisfies the preset condition is taken as a candidate commodity set:

for the commodity entities to be detected, the attribute overlap ratio between the commodity entities and each commodity entity corresponding to the knowledge graph is calculated, the commodity entities with the attribute overlap ratio exceeding the preset condition in each commodity entity corresponding to the knowledge graph are screened, corresponding sorting is carried out according to the sequence from high to low of the attribute overlap ratio, therefore, a candidate commodity set is constructed, and a plurality of commodity entities with high attribute overlap ratio with the commodity entities to be detected are obtained and used as the candidate commodity set.

The preset condition may be an attribute coincidence threshold, and this embodiment recommends that when the attribute coincidence degree is greater than the attribute coincidence threshold, the commodity entity is screened from the commodity database to construct a candidate commodity set. The attribute coincidence threshold is selected and set from the interval of 0.1 to 1, and specifically, how the preset threshold takes value, the technical personnel in the field can flexibly change according to the actual service situation.

And performing attribute overlap ratio calculation based on the attribute set of the commodity entity to be detected and the attribute set of the commodity entity corresponding to the knowledge graph, screening out a plurality of commodity entities with similar attribute overlap ratios from the knowledge graph, and screening out obviously dissimilar commodity entities, thereby reducing the data processing amount of subsequent steps. Meanwhile, the description of the attribute values is accurate, and the data can be retrieved through the attribute values.

Referring to fig. 6, referring to the previous embodiment, in step S1400, the semantic matching is performed on the commodity information of the commodity entity to be detected one by one with the sample set of the commodity entities in the candidate commodity set, and the commodity entity matched with the commodity entity to be detected is determined from the semantic matching, including the following steps:

step S1410, calculating a first document similarity between the commodity information of the commodity entity to be detected and the sample set of the commodity entities in the candidate commodity set, taking the attribute contact degree corresponding to the commodity entity as a weighted similarity obtained by weighting the first document similarity, and screening out the commodity entities whose weighted similarity satisfies a preset condition to form a first commodity set:

in order to calculate the first document similarity between the commodity information of the commodity entity to be detected and the sample set of each commodity entity in the candidate commodity set, the commodity information of the commodity entity to be detected and the sample set of the commodity entity in the candidate commodity set are subjected to document similarity calculation based on a document similarity algorithm, and the first document similarity between the commodity entity to be detected and each commodity entity in the candidate commodity set is obtained.

The document similarity algorithm is an algorithm capable of calculating the similarity between texts, such as BM25 and TP-IDF, and the BM25 algorithm is recommended in this embodiment, and those skilled in the art can flexibly change the specific implementation according to the actual service scenario.

In order to obtain the first document similarity between the commodity information of the commodity entity to be detected and the sample set of each commodity entity in the candidate commodity set, and as a measurement index of the commodity information of the commodity entity to be detected and the sample set of each commodity entity in the candidate commodity set on the document similarity level, an exemplary formula is as follows:

Srr_new,j＝S_new,i.BM25(X_new,X_J)

wherein, Srr_new,jRepresents the weighted similarity, S_new,iFor the above calculated obtained attribute overlap ratio, used as a weight in the formula, BM25() represents the BM25 algorithm.

According to the formula, calculating a first document similarity between the commodity information of the commodity entity to be detected and each commodity entity of the candidate commodity set based on a document similarity algorithm and taking the attribute overlap degree calculated above as a weight, normalizing the value range of the first document similarity to a numerical space such as [0,1] for the convenience of calculation, wherein the larger the first document similarity is, the higher the document similarity between the two documents is, and the smaller the document similarity is, the larger the difference between the two documents is.

For the commodity entities to be detected, the first document similarity between the commodity entities and each commodity entity of the candidate commodity set is calculated, the commodity entities with the first document similarity meeting the preset conditions in each commodity entity of the candidate commodity set are screened, corresponding sorting is carried out according to the sequence of the first document similarity from high to low, the first commodity set is constructed, and a plurality of commodity entities with higher first document display degrees with the commodity entities to be detected are obtained and used as the first commodity set.

The preset condition may be a first document similarity threshold, and in this embodiment, it is recommended that when the first document similarity is greater than the first document similarity threshold, the commodity entity is screened from the candidate commodity set, so as to construct a first commodity set. The similarity threshold of the first document is selected and set from the interval of 01 to 1, and the technical personnel in the field can flexibly change the value of the specific threshold according to the actual service situation.

The first document similarity calculation is carried out on the basis of the commodity information of the commodity entity to be detected and the sample set of each commodity entity of the candidate commodity set, a plurality of commodity entities with similar first document similarity can be screened out from the candidate commodity set, obviously dissimilar commodity entities are screened out, and data processing amount of subsequent steps is reduced.

Step 14200, calculating semantic similarity between the semantic vector of the commodity information of the commodity entity to be detected and the semantic vector of the sample set of commodity entities in the first commodity set, and screening out commodity entities with semantic similarity meeting preset conditions to form a second commodity set;

and extracting corresponding semantic vectors from the commodity information of the commodity entity to be detected by adopting a text feature extraction model which is pre-trained to a convergence state, and extracting the corresponding semantic vectors from the sample set of the commodity entity in the first commodity set. The text feature extraction model may adopt various neural network models suitable for extracting deep semantic information of a text, such as AlBert, ERNIE, elettra, Bert, and the like, and in a preferred embodiment, it is recommended that a Bert model be used as the text feature extraction model.

In a recommended embodiment, a Bert model is adopted as a text feature extraction model, and corresponding semantic vectors are extracted from the commodity information of the commodity entity to be detected and the sample set of the commodity entity in the first commodity set through the text feature extraction model, wherein an example formula is as follows:

V_j＝Bert(X_j)

wherein, V_jA semantic vector representing the commodity entities in the first commodity set, and j represents the jth description text in the sample set of commodity entities in the first commodity set.

V_new＝Bert(X_new)

Wherein, V_newAnd the semantic vector represents the commodity to be detected, and the new represents the description text of the commodity to be detected.

And similarly, extracting the deep features of the sample set of the commodity entity of the first commodity set by the Bert model to obtain the corresponding text vectors, and storing the obtained semantic vectors into a commodity feature library.

After the semantic vector of the commodity information of the commodity entity to be detected and the semantic vector of the sample set of the commodity entity in the first commodity set are obtained, semantic similarity between the semantic vector of the commodity information of the commodity entity to be detected and the semantic vector of the sample set of the commodity entity in the first commodity set is calculated.

The semantic similarity between the semantic vector of the commodity information of the commodity to be detected and the semantic vector of the sample set of each commodity entity in the first commodity set can be calculated one by one, and the semantic similarity between the semantic vector of the commodity information of the commodity to be detected and the semantic vector of the sample set of each commodity entity in the first commodity set can be calculated based on any data distance algorithm such as cosine similarity, Euclidean distance, Pearson coefficient and the like. As the measurement indexes of the semantic vector of the commodity entity to be detected and the semantic vector of the commodity entity in the first commodity set at the semantic level, an example formula is as follows:

Sc_new,j＝Cosine(V_new,V_j)

wherein Sc_new,jRepresenting semantic similarity, Cosine () representing a Cosine similarity algorithm.

According to the formula, calculating the semantic similarity between the commodity information of the commodity entity to be detected and each commodity entity of the first commodity set based on a cosine similarity algorithm, normalizing the value range of the semantic similarity to a numerical space of [0,1] for the convenience of calculation, wherein the larger the semantic similarity is, the higher the document similarity between the two is, and the smaller the semantic similarity is, the larger the difference between the two is, the lower the document similarity is.

And for the commodity entities to be detected, calculating the semantic similarity between the commodity entities and each commodity entity of the first commodity set, screening the commodity entities with the semantic similarity meeting preset conditions in each commodity entity of the first commodity set so as to screen the commodity entities, constructing a second commodity set, and acquiring a plurality of commodity entities with higher first document display degrees with the commodity entities to be detected as the second commodity set.

The preset condition may be a semantic similarity threshold, and in this embodiment, it is recommended that when the semantic similarity is greater than the semantic similarity threshold, the commodity entities are screened from the first commodity set and are collected in the second commodity set, the semantic similarity threshold is selected and set within an interval of 0.1 to 1, and how to take the value of the specific threshold is how to take the value, and those skilled in the art can flexibly change the value according to the actual business situation.

The second commodity set may include a commodity entity, the commodity entity and the to-be-detected commodity entity are the same commodity entity, and the to-be-detected commodity entity and the commodity entity in the second commodity set have different commodity information, that is, the to-be-detected commodity entity and the commodity entity in the second commodity set are substantially the same commodity entity, but description texts of the respective commodity information are different.

The semantic vectors of the commodity entities to be detected and the semantic vectors of the commodity entities in the first commodity set are calculated based on a cosine similarity algorithm, and the commodity entities which are the same as the commodity entities to be detected can be screened out from the first commodity set, so that the efficiency of detecting the same commodity entities by the E-commerce platform is improved.

In one embodiment, the second commodity set may include a plurality of commodity entities, and the plurality of commodity entities are the same as or have higher similarity with the commodity entity to be detected. Alternatively, the plurality of commodity entities in the second commodity set are different versions of the same commodity entity.

Step 1430, the goods entities in the second goods set are used as goods entities matched with the goods entities to be detected, and are pushed to the terminal device providing the goods information of the goods entities to be detected:

in order to listen to the similarity detection information of the commodity entity to be detected to the terminal device, the obtained commodity information of the commodity entity in the second commodity set can be pushed to the display interface on the terminal device to be displayed.

After receiving the commodity information of the commodity entity in the second commodity set, the terminal device can display the commodity information on a display interface, so that a merchant user or a background user can obtain the similarity detection information of the commodity entity to be detected, and whether the commodity uploaded by the user infringes the rights and interests of the merchant user with the sales right is judged.

Referring to fig. 7, in a deepened partial embodiment, the step of calculating the semantic direction similarity between the semantic vector of the commodity information of the commodity entity to be detected and the semantic vector of the sample set of commodity entities in the first commodity set in step S1420 includes the following steps:

s1440, carrying out iterative training on a preset text feature extraction model to a convergence state by using a sample set of commodity entities in the knowledge graph as a training sample;

in order to extract the semantic vector of the commodity information of the commodity entity, a text feature extraction model can be prepared, the text feature extraction model can be realized based on basic neural network models such as AlBert, ERNIE, ELECTRA, Bert and the like, and the text feature extraction model can be trained to be in a convergence state by a person skilled in the art by adopting sufficient corresponding training samples, so that the person skilled in the art can learn the capability of extracting the deep semantic vector of the commodity entity according to the input commodity information of the commodity entity.

The data set used for providing the training samples can directly adopt the sample set in the knowledge subgraph of each commodity entity in the knowledge graph. The commodity information of one version in the sample set, namely the description text corresponding to each attribute of the commodity entity, can be used as a training sample, so that massive training samples can be obtained, and the unsupervised training can be implemented on the text feature extraction model.

Step S1450, respectively extracting a sample set of each commodity entity in the first commodity set and a semantic vector of commodity information of a commodity entity to be detected by using the text feature extraction model:

and extracting corresponding semantic vectors from the sample set of each commodity entity of the first commodity set by adopting a text feature extraction model trained to a convergence state, and extracting corresponding semantic vectors from the commodity information of the commodity entity to be detected. In one embodiment, the extracted semantic vectors may be associatively stored in the knowledge sub-graph of the corresponding commodity entity for direct recall.

According to the embodiment, the sample set in the knowledge graph can not only provide training samples for the text feature extraction model, but also play a role of reference information in the data retrieval and accuracy checking process, and the economic benefit is remarkable.

Referring to fig. 8, a commodity information matching apparatus adapted to one of the objectives of the present application is provided, which embodies the functionalities of the commodity information matching method of the present application, and includes an information extraction module 1100, a retrieval matching module 1200, an extraction sorting module 1300, and a semantic matching module 1400; wherein: the information extraction module 1100 is configured to acquire commodity information of a commodity entity to be detected, wherein the commodity information includes a plurality of description texts, and extract a knowledge subgraph thereof, and the knowledge subgraph includes mapping relationship data between attributes and attribute values extracted from the description texts; the 1200 retrieve matching module retrieves a knowledge graph from which a plurality of commodity entities matched with knowledge subgraphs of the commodity entity to be detected form a candidate commodity set, wherein the knowledge graph stores the knowledge subgraphs corresponding to the plurality of commodity entities in a commodity database; the extraction and sorting module 1300 is used for obtaining a sample set corresponding to each commodity entity in the candidate commodity set, each attribute in the knowledge subgraph corresponding to the sample set comprises one or more description texts, and all the description texts in the sample set are organized according to a unified sequence; the semantic matching module 1400 performs semantic matching on the commodity information of the commodity entity to be detected and the sample set of commodity entities in the candidate commodity set one by one, and determines a commodity entity matched with the commodity entity to be detected.

In some embodiments of the present disclosure, the information extracting module 1100 includes: the information extraction unit is used for responding to a commodity release request of an online shop and acquiring commodity information of a to-be-detected commodity entity corresponding to the request, wherein the commodity information comprises any one or more items of a commodity title, commodity details and a description text corresponding to commodity attribute data of the commodity entity; the attribute extraction unit is used for extracting the attributes of the commodity information of the commodity entity to be detected so as to extract corresponding attribute values from the description texts according to different attributes and form mapping relation data between the attributes and the attribute values; and the subgraph construction unit is used for constructing the mapping relation data between the attributes and the attribute values of the commodity entity to be detected into the corresponding knowledge subgraph according to the preset knowledge graph structure.

In a deepened part of the embodiments, the merchandise information matching device further includes: the map construction module is used for constructing a knowledge map corresponding to the commodity database, the knowledge map comprises knowledge subgraphs corresponding to each commodity entity in the commodity database, and the knowledge subgraphs comprise mapping relation data between attributes and attribute values extracted from description texts of commodity information of the corresponding commodity entities.

In some embodiments of the deepening, the map building module includes: the map creating unit is used for creating a knowledge map and acquiring commodity information of each commodity entity in the commodity database; the mapping construction unit is used for extracting the attributes of the commodity information of each commodity entity so as to extract corresponding attribute values from each description text according to different attributes to form mapping relation data between the attributes and the attribute values; the subgraph creating unit is used for constructing mapping relation data between the attribute and the attribute value of each commodity entity into a knowledge subgraph of the corresponding commodity entity according to the preset structure of the knowledge graph; and the sub-image matching unit is used for acquiring different description versions of the commodity information of each commodity entity, wherein each description version comprises a description text matched with the mapping relation data between each attribute and the attribute value of the commodity entity, constructing samples corresponding to each description version according to a unified sequence, and forming a sample set of the corresponding commodity entity by all the samples and storing the sample set into a knowledge sub-image of the corresponding commodity entity.

In some embodiments of the deepening section, the retrieving matching module 1200 includes: the mapping and merging unit is used for acquiring mapping relation data between the attributes and the attribute values in the knowledge subgraph of the entity to be detected as an attribute set of the entity to be detected; the coincidence calculation unit is used for performing coincidence degree matching calculation on the attribute set of the commodity entity to be detected and the attribute set corresponding to each commodity entity in the knowledge graph, and determining the attribute coincidence degree between each commodity entity in the knowledge graph and the commodity entity to be detected; and the candidate screening unit is used for collecting the commodity entity set with the attribute coincidence degree meeting the preset conditions in the knowledge graph into a candidate commodity set.

In some embodiments of the deepening embodiment, the semantic matching module 1400 includes: the first screening unit is used for calculating a first document similarity between the commodity information of the commodity entity to be detected and a sample set of the commodity entities in the candidate commodity set, taking the corresponding attribute contact degree of the commodity entity as the weight of the first document similarity to obtain a weighted similarity, and screening out the commodity entities with the weighted similarity meeting a preset condition to form a first commodity set; the second screening unit is used for calculating the semantic similarity between the semantic vector of the commodity information of the commodity entity to be detected and the semantic vector of the sample set of the commodity entities in the first commodity set, and screening out the commodity entities with the semantic similarity meeting the preset conditions to form a second commodity set; and the commodity pushing unit is used for pushing the commodity entities concentrated by the second commodities as commodity entities matched with the commodity entities to be detected to the terminal equipment for providing the commodity information of the commodity entities to be detected.

In some embodiments of the deepening embodiment, the semantic matching module 1400 further includes: the model training unit is used for performing iterative training on a preset text feature extraction model to a convergence state by taking a sample set of commodity entities in the knowledge graph as a training sample; and the vector extraction unit is used for respectively extracting a sample set of each commodity entity in the first commodity set and a semantic vector of commodity information of the commodity entity to be detected by adopting the text feature extraction model.

In order to solve the technical problem, the embodiment of the application further provides computer equipment. As shown in fig. 9, the internal structure of the computer device is schematic. The computer device includes a processor, a computer-readable storage medium, a memory, and a network interface connected by a system bus. The computer readable storage medium of the computer device stores an operating system, a database and computer readable instructions, the database can store control information sequences, and when the computer readable instructions are executed by a processor, the processor can realize a commodity search category identification method. The processor of the computer device is used for providing calculation and control capability and supporting the operation of the whole computer device. The memory of the computer device may store computer readable instructions, and when the computer readable instructions are executed by the processor, the processor may be caused to execute the goods information matching method of the present application. The network interface of the computer device is used for connecting and communicating with the terminal. Those skilled in the art will appreciate that the architecture shown in fig. 9 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In this embodiment, the processor is configured to execute specific functions of each module and its sub-module in fig. 8, and the memory stores program codes and various data required for executing the modules or sub-modules. The network interface is used for data transmission to and from a user terminal or a server. The memory in this embodiment stores program codes and data necessary for executing all modules/sub-modules in the product information matching apparatus of the present application, and the server can call the program codes and data of the server to execute the functions of all sub-modules.

The present application further provides a storage medium storing computer readable instructions, which when executed by one or more processors, cause the one or more processors to perform the steps of the method for matching commodity information according to any one of the embodiments of the present application.

The present application also provides a computer program product comprising computer programs/instructions which, when executed by one or more processors, implement the steps of the method as described in any of the embodiments of the present application.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments of the present application may be implemented by hardware related to instructions of a computer program, where the computer program may be stored in a computer-readable storage medium, and when executed, the computer program may include the processes of the embodiments of the methods as described above. The storage medium may be a computer-readable storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a Random Access Memory (RAM).

To sum up, the technical scheme of the application is suitable for a commodity infringement detection scene, when it is needed to judge whether the commodity entity to be detected forms infringement to some commodity entities, the further checking is accurate on the basis of checking, not only can the missed detection be avoided through the effect exerted by the attribute overlap ratio, but also the problem that the commodity information of the commodity entity to be detected is partially deficient is overcome by means of the positive effect exerted by the reference information provided by the sample set, and the infringement commodity is accurately identified.

Those of skill in the art will appreciate that the various operations, methods, steps in the processes, acts, or solutions discussed in this application can be interchanged, modified, combined, or eliminated. Further, various operations, methods, steps, measures, schemes in the various processes, methods, procedures that have been discussed in this application may be alternated, modified, rearranged, decomposed, combined, or eliminated. Further, the steps, measures, and schemes in the various operations, methods, and flows disclosed in the present application in the prior art can also be alternated, modified, rearranged, decomposed, combined, or deleted.

The foregoing is only a partial embodiment of the present application, and it should be noted that, for those skilled in the art, several modifications and decorations can be made without departing from the principle of the present application, and these modifications and decorations should also be regarded as the protection scope of the present application.

Claims

1. A commodity information matching method is characterized by comprising the following steps:

and carrying out semantic matching on the commodity information of the commodity entity to be detected and the sample set of the commodity entities in the candidate commodity set one by one, and determining the commodity entity which is matched with the commodity entity to be detected.

2. The commodity information matching method according to claim 1, wherein the step of obtaining commodity information of a commodity entity to be detected comprises the steps of:

responding to a commodity publishing request of an online shop, and acquiring commodity information of a to-be-detected commodity entity corresponding to the request, wherein the commodity information comprises any one or more items of a commodity title, commodity details and a description text corresponding to commodity attribute data of the commodity entity;

3. The commodity information matching method according to claim 1, wherein before the step of obtaining the commodity information of the commodity entity to be detected, the method comprises the steps of:

4. The commodity information matching method according to claim 3, wherein constructing a knowledge graph corresponding to the commodity database comprises the steps of:

extracting attributes of the commodity information of each commodity entity to extract corresponding attribute values from each description text according to different attributes to form mapping relation data between the attributes and the attribute values;

different description versions of the commodity information of each commodity entity are obtained, each description version comprises a description text matched with the mapping relation data between each attribute and the attribute value of the commodity entity, a sample corresponding to each description version is constructed according to a unified sequence, all samples form a sample set of the corresponding commodity entity, and the sample set is stored in a knowledge subgraph of the corresponding commodity entity.

5. The commodity information matching method according to claim 1, wherein a plurality of commodity entities of which knowledge subgraphs are matched with the knowledge subgraph of the commodity entity to be detected are retrieved from the knowledge graph spectrum to form a candidate commodity set, and the method comprises the following steps:

6. The commodity information matching method according to claim 5, wherein the commodity information of the commodity entity to be detected is subjected to semantic matching with the sample set of commodity entities in the candidate commodity set one by one, and a commodity entity matched with the commodity entity to be detected is determined from the semantic matching, comprising the following steps:

calculating semantic similarity between a semantic vector of commodity information of a commodity entity to be detected and a semantic vector of a sample set of commodity entities in a first commodity set, and screening out commodity entities with semantic similarity meeting preset conditions to form a second commodity set;

and pushing the commodity entity in the second commodity set as a commodity entity matched with the commodity entity to be detected to the terminal equipment providing the commodity information of the commodity entity to be detected.

7. The commodity information matching method according to claim 6, wherein before the step of calculating the semantic similarity between the semantic vector of the commodity information of the commodity entity to be detected and the semantic vector of the sample set of commodity entities in the first commodity set, the method comprises the following steps:

8. A commodity information matching apparatus, characterized by comprising:

9. A computer device comprising a central processing unit and a memory, characterized in that the central processing unit is adapted to invoke the execution of a computer program stored in the memory to perform the steps of the method according to any one of claims 1 to 7.

10. A computer-readable storage medium, characterized in that it stores, in the form of computer-readable instructions, a computer program implemented according to the method of any one of claims 1 to 7, which, when invoked by a computer, performs the steps comprised by the corresponding method.