CN116304104A

CN116304104A - Knowledge graph construction method, knowledge graph construction device, medium and electronic equipment

Info

Publication number: CN116304104A
Application number: CN202310300518.6A
Authority: CN
Inventors: 罗彤; 李亚乾
Original assignee: Shanghai Jinsheng Communication Technology Co ltd
Current assignee: Shanghai Jinsheng Communication Technology Co ltd
Priority date: 2023-03-23
Filing date: 2023-03-23
Publication date: 2023-06-23

Abstract

The disclosure provides a knowledge graph construction method, a knowledge graph construction device, a computer readable storage medium and electronic equipment, and relates to the technical field of artificial intelligence. The knowledge graph construction method comprises the following steps: acquiring a plurality of visual labels and determining the relation among the visual labels; generating nodes based on the visual labels, and generating edges between the nodes based on the relation between the visual labels so as to construct a visual label knowledge graph; and adding node attributes to the nodes according to the characteristics of the visual labels, and adding edge attributes to the edges according to the relation types among the visual labels. The method and the device can generate the accurate and reliable visual tag knowledge graph so as to be widely applied to scenes related to the visual tag through the knowledge graph.

Description

Knowledge graph construction method, knowledge graph construction device, medium and electronic equipment

Technical Field

The disclosure relates to the technical field of artificial intelligence, in particular to a knowledge graph construction method, a knowledge graph construction device, a computer readable storage medium and electronic equipment.

Background

Knowledge Graph aims at modeling and recording association relations and knowledge among things by adopting Graph Structure (Graph Structure) so as to effectively realize more accurate object-level search. In recent years, with rapid development of many fields such as natural language processing, deep learning, graph data processing, etc., knowledge graph related technologies have been widely applied in many fields such as search engines, intelligent questions and answers, language understanding, recommendation calculation, big data decision analysis, etc. Knowledge mapping has become one of the important technologies indispensable for realizing artificial intelligence at the cognitive level. However, the prior art generally focuses only on how to apply a specific scene through the existing knowledge graph, but does not have an accurate and reliable visual tag knowledge graph which can be widely applied in the visual field.

Disclosure of Invention

The disclosure provides a knowledge graph construction method, a knowledge graph construction device, a computer readable storage medium and electronic equipment, so as to solve the problem that in the prior art, no reliable and effective visual tag knowledge graph exists in the field of visual tags to at least a certain extent.

Other features and advantages of the present disclosure will be apparent from the following detailed description, or may be learned in part by the practice of the disclosure.

According to a first aspect of the present disclosure, there is provided a knowledge graph construction method, including: acquiring a plurality of visual labels and determining the relation among the visual labels; generating nodes based on the visual labels, and generating edges between the nodes based on the relation between the visual labels so as to construct a visual label knowledge graph; and adding node attributes to the nodes according to the characteristics of the visual labels, and adding edge attributes to the edges according to the relation types among the visual labels.

According to a second aspect of the present disclosure, there is provided a knowledge graph construction apparatus including: the label acquisition module is used for acquiring a plurality of visual labels and determining the relation among the visual labels; the map generation module is used for generating nodes based on the visual labels, and generating edges between the nodes based on the relation between the visual labels so as to construct a visual label knowledge map; and the attribute adding module is used for adding node attributes to the nodes according to the characteristics of the visual labels and adding edge attributes to the edges according to the relationship types among the visual labels.

According to a third aspect of the present disclosure, there is provided an image description method including: acquiring a visual tag knowledge graph, wherein the visual tag knowledge graph is constructed by the knowledge graph construction method in the first aspect; obtaining a main label of the image to be described according to the result of identifying the image to be described; mapping the main label by using the visual label knowledge graph to obtain a mapping label; and generating the description information of the image to be described based on the main label or a label phrase formed by the main label and the mapping label.

According to a fourth aspect of the present disclosure, there is provided an image description apparatus including: the knowledge graph acquisition module is used for acquiring a visual tag knowledge graph, and the visual tag knowledge graph is constructed by the knowledge graph construction method in the first aspect; the main label acquisition module is used for acquiring a main label of the image to be described according to the result of identifying the image to be described; the mapping tag acquisition module is used for mapping the main tag by using the visual tag knowledge graph to obtain a mapping tag; and the descriptive information acquisition module is used for generating descriptive information of the image to be described based on the main label or a label phrase formed by the main label and the mapping label.

According to a fifth aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the knowledge graph construction method of the first aspect, the image description method of the third aspect and possible implementations thereof.

According to a sixth aspect of the present disclosure, there is provided an electronic device comprising: a processor; and the memory is used for storing executable instructions of the processor. Wherein the processor is configured to perform the knowledge graph construction method of the first aspect, the image description method of the third aspect and possible implementations thereof via execution of the executable instructions.

The technical scheme of the present disclosure has the following beneficial effects:

acquiring a plurality of visual labels and determining the relation among the visual labels; generating nodes based on the visual labels, and generating edges between the nodes based on the relation between the visual labels so as to construct a visual label knowledge graph; and adding node attributes to the nodes according to the characteristics of the visual labels, and adding edge attributes to the edges according to the relationship types among the visual labels. On the one hand, the present exemplary embodiment provides a method for constructing a knowledge graph, which can construct an accurate and reliable knowledge graph of a visual tag by using the visual tag as a node and the relationship between the visual tags as an edge, so as to provide a reliable processing mode for the field of the visual tag; on the other hand, the present exemplary embodiment adds corresponding attributes to the visual tag nodes and the relationship types between the visual tags, so as to further perfect the visual tag knowledge graph and enrich the knowledge information of the knowledge graph; on the other hand, the present exemplary embodiment can form a visual tag knowledge graph for performing visual tag processing through a simple and convenient flow, and has a wide application range.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure. It will be apparent to those of ordinary skill in the art that the drawings in the following description are merely examples of the disclosure and that other drawings may be derived from them without undue effort.

Fig. 1 shows a schematic diagram of a system architecture in the present exemplary embodiment;

fig. 2 shows a flowchart of a knowledge graph construction method in the present exemplary embodiment;

FIG. 3 illustrates a schematic diagram of a category definition of a visual tag in the present exemplary embodiment;

FIG. 4 illustrates a schematic diagram of a category definition of another visual tag in the present exemplary embodiment;

fig. 5 shows a flowchart of an image description method in the present exemplary embodiment;

fig. 6 shows a flowchart of another image description method in the present exemplary embodiment;

fig. 7 shows a block diagram of a knowledge graph construction apparatus in the present exemplary embodiment;

fig. 8 is a block diagram showing a configuration of an image description apparatus in the present exemplary embodiment;

Fig. 9 shows a structural diagram of an electronic device in the present exemplary embodiment.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments may be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the present disclosure. However, those skilled in the art will recognize that the aspects of the present disclosure may be practiced with one or more of the specific details, or with other methods, components, devices, steps, etc. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the present disclosure.

Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus a repetitive description thereof will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in software or in one or more hardware modules or integrated circuits or in different networks and/or processor devices and/or microcontroller devices.

The exemplary embodiment of the present disclosure first provides a knowledge graph construction method. The system architecture of the operating environment of the present exemplary embodiment is described below in conjunction with fig. 1.

Referring to fig. 1, a system architecture 100 may include a terminal 110 and a server 120. The terminal 110 may be an electronic device such as a mobile phone, a tablet computer, an intelligent wearable device, and the like. The server 120 is generally referred to as a background system for generating a knowledge graph in the exemplary embodiment, and may be a server or a cluster of multiple servers. The terminal 110 and the server 120 may form a connection through a wired or wireless communication link for data interaction.

In one embodiment, the knowledge graph construction method in the present exemplary embodiment may be performed by the terminal 110. For example, the terminal 110 may obtain visual tags from a local album, a network or other cloud platform, determine the relationship between the visual tags, and further execute a method for constructing a visual tag knowledge graph.

In one embodiment, the knowledge graph construction method in the present exemplary embodiment may be performed by the server 120. For example, the server 120 may obtain a plurality of visual tags and relationships between the visual tags from the terminal 110, and further perform a method of constructing a visual tag knowledge graph; or the terminal 110 may upload a plurality of visual tags to the server 120, determine a relationship between the visual tags by the server 120, perform a construction method of a visual tag knowledge graph, and the like.

As can be seen from the above, in the present exemplary embodiment, the implementation subject of the knowledge graph construction method may be the terminal 110 or the server 120, which is not limited in the present disclosure.

The flow of the knowledge graph construction method is described below with reference to fig. 2. Referring to fig. 2, the knowledge graph construction method may include the following steps S210 to S230:

step S210, a plurality of visual tags are acquired, and a relationship between the visual tags is determined.

Wherein, the visual tag refers to a classification tag of an object with visual characteristics, and the object can be an object, a scene, a building, an event and the like in an image. An object with visual characteristics refers to an object that can be distinguished from other things based on vision, such as "cat", "dog", etc., that has obvious visual characteristics, which can have a corresponding visual label for a cat or dog; "wedding" while not having a direct correspondence to physical things, has a certain visual differentiation, such as a photograph of a wedding scene containing a church, which may have a visual label of the wedding; while "you" or "me" do not have visual features, nor do they have corresponding visual labels. In the present exemplary embodiment, one visual tag may indicate a type of visual feature object, such as a plurality of images including dogs, which may be images of a dog walking, images of a pet store, or close-up images of a dog, which may correspond to the one visual tag of the dog.

The relationship between the visual tags may be used to reflect the type of relationship between the visual tags, or the degree of correlation, etc., such as the hyponymy that the visual tag "cat" and the visual tag "dog" both belong to the visual tag "pet", the higher degree of correlation that the visual tag "balloon" and the visual tag "wedding" belong to, etc.

In this exemplary embodiment, the visual tag may be obtained from multiple places, for example, may be extracted from an image in a local album, may be obtained from a picture downloaded in the cloud, may be collected based on an existing visual tag library, and so on.

In an exemplary embodiment, in step S210, acquiring a plurality of visual tags may include:

obtaining a candidate tag, wherein the candidate tag comprises at least one of the following: the method comprises the steps of demand labels in visual business, labels supported by an image classification function, search words meeting preset search conditions in image search, words obtained by word segmentation of image description sentences;

visual tags are screened from the candidate tags.

Considering that the content in the visual tag knowledge graph needs to have a high degree of correlation with the content of the image vision, the present exemplary embodiment may determine candidate tags from a variety of sources, and then screen from the candidate tags to obtain the visual tag.

Specifically, the candidate tag may be a requirement tag in visual business, for example, a tag in a terminal or a device under album classification requirement, such as a pet tag, a portrait tag, a building tag, or the like; or a label under the search requirement in the terminal or the device, such as a portrait label proposed when the user searches for a portrait in the album, or a landscape label proposed when searching for a landscape, etc.

The candidate labels may also be labels supported by an image classification function, for example, the terminal album may perform image recognition according to all or part of the images existing locally, classify according to the image recognition result, and assign a category name to each category, and the category name may be used as the candidate label.

The candidate tag may also be a search word satisfying a preset search condition in the image search, where the preset search condition may be a search condition satisfying a high-frequency search word, and specifically may include that the search frequency is greater than a preset frequency threshold, or a preset number of search frequencies before ranking, for example, when the user searches for the image, frequently searches for dogs, cats, rabbits, roses, etc., then the search word satisfying the preset search condition may be a search word with a frequency before ranking, such as dogs, cats, rabbits, etc., and may be used as the candidate tag. The preset frequency threshold value or the preset number of settings before ranking and the settings of other preset search conditions can be defined according to actual requirements, and the disclosure is not limited in particular.

The candidate labels can also be words obtained by word segmentation of the image description sentence, for example, an image is analyzed to obtain the image description sentence which is "girl walks a dog on a lake", then word segmentation processing is carried out on the image description sentence to obtain words such as "girl", "lake", "walk a dog" or "dog", candidate labels can also be obtained based on the words, for example, all the words can be used as candidate labels, the words with the top ranking of the words can be used as candidate labels, or a plurality of the words can be randomly selected from the words as candidate labels. The analyzed images may be derived from various databases, such as a capture database or other image databases, and the disclosure is not limited in detail.

In an exemplary embodiment, the above-mentioned screening the visual tag from the candidate tags includes at least one of the following screening modes:

filtering candidate labels without visual features from the candidate labels;

determining visual distinction between different candidate labels, and if the visual distinction between a plurality of candidate labels is lower than a visual distinction threshold, de-duplicating the plurality of candidate labels;

filtering candidate labels with the commonness lower than a preset commonness threshold value from the candidate labels;

Candidate tags that do not have a particular purpose are filtered out of the candidate tags.

In order to improve accuracy and effectiveness of the obtained visual tag, after the candidate tag is obtained, the candidate tag may be screened according to a specific standard, so as to screen the visual tag required in the present exemplary embodiment from the candidate tag, and a specific screening manner may include at least one of the foregoing manners, that is, screening may be performed by any one of the foregoing manners, or screening may be performed by the foregoing manners together, and when screening is performed by the various manners together, each screening process may be performed in a certain order, or a screening process may be performed simultaneously, or the like, which is not specifically limited in this disclosure.

Specifically, candidate tags that do not have visual features are filtered out of candidate tags, for example, the candidate tag "cat" may be left as a visual tag, and time, places without specific marks, people or other tags that cannot be judged by pure visual features need to be filtered out, such as "today", "Shanghai", "I'm" may be filtered out.

And determining the visual distinction degree between different candidate labels, and de-duplicating the candidate labels if the visual distinction degree between the candidate labels is lower than a visual distinction degree threshold value. The visual distinction threshold may be a criterion for determining whether two objects are easy to distinguish, and when the visual distinction degree of the two objects is higher than the visual distinction degree threshold, it indicates that the two objects are easy to distinguish, and may be used as two labels, and when the visual distinction degree of the two objects is lower than the visual distinction degree threshold, it indicates that the two objects are not easy to distinguish, and may be the same or similar objects, for example, the "poodle" and the "tad" are lower in visual distinction degree, and point to the same category, at this time, duplication removal may be performed on a plurality of candidate labels, only the "poodle" may be reserved, or only the "tad" may be reserved.

Considering that visual tags should have a certain degree of commonness, candidate tags with a degree of commonness lower than a preset commonness threshold can be filtered out of candidate tags, the process can be considered as filtering very rare objects, scenes or calls which do not accord with daily habits, for example, jean athletic performances are less common and will be filtered out, while jean is relatively more common and can be reserved; very rarely, the 'Zhilou' is filtered out, and the 'Ji doll' is commonly called, can be added and the like. In this exemplary embodiment, whether the candidate tag is commonly represented by the frequency index of occurrence of the tag may be determined, and the preset commonness threshold is the frequency threshold, and the judgment of the common word or the non-common word may be determined based on the frequency of occurrence of the word in the existing word stock or the frequency of occurrence of the thing included in the gallery, for example, in the existing word stock, when the frequency of occurrence of a certain word is lower than the preset threshold, the commonness of the word is considered to be lower than the preset commonness threshold, and the word is the non-common word. In addition, the judgment of the common word or the non-common word can be performed by counting the occurrence interval time of the same word in a period of time, when the interval time is longer than the preset time, the word is not frequently occurred, the word is considered to be the non-common word, and the preset commonness threshold can be characterized by an occurrence interval time index. When the popularity of the candidate tag is lower than a preset popularity threshold, filtering is needed in the step.

In addition, visual tags are often of some searching or recall value to the user, i.e., the user may need to locate the tag-related image in some circumstances for recall, sharing, etc. Thus, candidate tags that do not have a particular purpose may be filtered out of candidate tags, where the particular purpose may include sharing, recording, etc., e.g., various objects or goods may be eligible for sharing purposes, screenshot may be eligible for recording purposes, etc., and may be retained, etc.

Step S220, generating nodes based on the visual labels, and generating edges between the nodes based on the relation between the visual labels to construct a visual label knowledge graph.

The present exemplary embodiment may construct a visual tag knowledge graph using visual tags as nodes and relationships between the visual tags as edges connecting the nodes. In the visual tag knowledge graph, each node, i.e. an entity, can represent a category tag that can be distinguished from each other, and can be a specific visual feature object, such as a pet, or an abstract visual feature concept, such as a wedding. Each side may represent a link between categories, whereby a knowledge database about visual tags is accurately built by visual tag knowledge graphs. In the present exemplary embodiment, each node may have a corresponding ID (Identity).

Step S230, adding node attributes to the nodes according to the characteristics of the visual labels, and adding edge attributes to the edges according to the relationship types among the visual labels.

In this exemplary embodiment, each node and edge in the constructed visual tag knowledge graph may have its corresponding attributes, namely, a node attribute and an edge attribute.

In an exemplary embodiment, the features of the visual tag may include: category definition of a visual tag, an example image of the visual tag, a visual tag score, visual tag visibility, user use frequency of the visual tag, foreign language name of the visual tag, scene of the visual tag, sentence component to which the visual tag belongs, part of speech of the visual tag, error correction word of the visual tag, synonym of the visual tag, near meaning word of the visual tag, association search word of the visual tag;

adding node attributes to the nodes according to the features of the visual tag, including:

one or more of the features of the visual tag are added to the associated data of the node as node attributes of the node.

The present exemplary embodiment may first determine multiple features of the visual tag, and then add one or more of the features of the visual tag to the associated data of the node, as node attributes of the node, that is, configure corresponding feature attributes for each node, and specific features of the visual tag and descriptions thereof, as shown in table 1 below:

TABLE 1

In an exemplary embodiment, the knowledge graph construction method may further include the following steps:

identifying an object in the image and determining a duty cycle of the object in the image;

and if the duty ratio of the object in the image exceeds a preset threshold, determining the category definition of the visual tag according to the object.

The image may typically include a variety of objects or elements, such as a cat and a dog in the same image, which may be given visual labels for the cat, as well as for the dog. In order to improve the effectiveness of the visual tag of the image and ensure the reliability of the construction of the knowledge graph, the present exemplary embodiment may identify the object in the image, such as a person, a plant, a building, an animal, etc., and then determine the ratio of the object in the image, when the ratio of the identified object in the image exceeds a preset threshold, determine the visual tag of the image based on the object, for example, in the image shown in fig. 3, identify the animal including a cat in the image, and when the ratio of the width and the height of the cat in the image exceeds a certain degree, consider the cat to be a specific subject in the image, and define the category of the visual tag as a cat; in the image shown in fig. 4, the cat is blocked by the obstacle, and the ratio of the cat in the image does not conform to the preset rule, so that the cat is not given a visual label. The duty ratio of the object in the image may include the duty ratio of the area where the object is located to the area of the image, or the duty ratio of the width and height of the object to the width and height of the image, or the duty ratio of the number of pixels in the area where the object is located to the number of pixels in the image, and so on.

The characteristics of the above-described partial visual tag may be obtained in the manner shown in table 2 below:

TABLE 2

When the process is combined with manual auditing, the process can be used for verifying whether the existing relationship is correct or not and judging whether the newly added relationship needs to be added or not. The process may include voting for the same attribute or relationship by a plurality of raters, and if a rater exceeding a preset threshold proportion agrees with the attribute or relationship, then the attribute/relationship is deemed to be correct. In order to ensure the quality of the results, the present exemplary embodiment may also calculate the consistency of the graders, that is, calculate the ICC (Interclass CorrelationCoefficient, intra-group correlation) of each grader with the rest, and if a preset threshold is exceeded, consider the grader results acceptable, otherwise reject the grader results, the preset threshold may be set according to actual needs, for example, the preset threshold of the ICC may be set to 0.4.

In an exemplary embodiment, the relationship types include: the method comprises the steps of upper sense words, lower sense words, related words, reasoning mapping words and similar labels;

adding edge attributes to edges according to the relationship types between the visual labels comprises:

and determining the class information of the edge between the two nodes corresponding to the two visual labels according to the relationship type of the relationship between the two visual labels, and adding the class information of the edge into the edge attribute of the edge.

The category information of the edge before the two nodes corresponding to the two visual labels refers to information, which is determined from the defined relationship types and accords with the relationship of the edge between the two nodes, and can reflect which relationship type the two nodes belong to, what relationship has, how the association degree is, and the like. The present exemplary embodiment may determine, according to a relationship type to which a relationship between two visual labels belongs, category information of an edge between two nodes corresponding to the two visual labels, and further add the category information of the edge to an edge attribute of the edge. In the present exemplary embodiment, the edge between two nodes may be an edge having a direction, the relationship type of the corresponding edge may be an hypernym, a hyponym, or the like, the edge between two nodes may also be an edge having no direction, the relationship type of the corresponding edge may be a related word or the like.

In an exemplary embodiment, the edge attribute may further include a tag conditional probability, and the knowledge graph construction method may further include:

and counting the conditional probability that one of the two visual labels appears and the other visual label appears, obtaining the label conditional probability of the two visual labels, and adding the label conditional probability into the edge attribute of the edge between the two nodes corresponding to the two visual labels.

In the present exemplary embodiment, in addition to qualitatively reflecting the relationship type of the edge between the two nodes by the above-described correlation, the conditional probability that one of the two visual tags appears and the other also appears may be counted by quantitative calculation, and the relationship of the edge between the two nodes may be actually measured based on the tag conditional probabilities of the two visual tags.

Specific relationship types and descriptions are shown in Table 3 below:

TABLE 3 Table 3

In an exemplary embodiment, the relationship type of the partial edges may be obtained in the manner shown in the following table 4:

TABLE 4 Table 4

In summary, in the present exemplary embodiment, a plurality of visual tags are acquired, and the relationship between the visual tags is determined; generating nodes based on the visual labels, and generating edges between the nodes based on the relation between the visual labels so as to construct a visual label knowledge graph; and adding node attributes to the nodes according to the characteristics of the visual labels, and adding edge attributes to the edges according to the relationship types among the visual labels. On the one hand, the present exemplary embodiment provides a method for constructing a knowledge graph, which can construct an accurate and reliable knowledge graph of a visual tag by using the visual tag as a node and the relationship between the visual tags as an edge, so as to provide a reliable processing mode for the field of the visual tag; on the other hand, the present exemplary embodiment adds corresponding attributes to the visual tag nodes and the relationship types between the visual tags, so as to further perfect the visual tag knowledge graph and enrich the knowledge information of the knowledge graph; on the other hand, the present exemplary embodiment can form a visual tag knowledge graph for performing visual tag processing through a simple and convenient flow, and has a wide application range.

After the visual tag knowledge graph is generated, on one hand, the visual tag knowledge graph can carry out structuring treatment on the related knowledge of the visual tag and can be applied to various application scenes so as to improve the utilization efficiency of the visual tag and cover the scenes; on the other hand, in the present exemplary embodiment, when node attributes are added to the nodes of each visual tag according to the features of the visual tag, different effects may be achieved according to the difference of the features, for example, the feature of synonyms of the visual tag, so that the coverage scene of the tag identification capability can be expanded without spending additional model calculation resources; the error correction words of the visual tag and the associated search word features of the visual tag can provide error correction and intention guessing capabilities for the tag; the sentence component features of the visual tag can provide picture description capability with controllable content, and compared with the prior art that the whole piece of description text information is directly generated through a capture model (picture speaking model), the tag is output by a tag model and is more controllable in sorting according to the components. The foreign language name features of the visual tag can be used for comparison with other databases or models, outputting foreign language tags and the like; on the other hand, when the edge attribute is added to the edge according to the relationship type between the visual labels, the corresponding effect of the relationship type between different visual labels can be realized in the application scene, for example, the relationship type of the reasoning mapping word, and the coverage scene of the label identification capability can be expanded without spending additional model calculation resources; the related word type can provide search and additional recommended word candidates, and provide priori knowledge for tag recognition capability, so that recognition accuracy and the like are improved.

The generated visual tag knowledge graph can be applied to a scene associated with a search intention, specifically, when a user searches for keywords, the input keywords are not necessarily visual related words due to misspelling, expression difference and other reasons, or the keywords are different from the actual search intention, and the user search intention is associated by means of node attributes in the visual tag knowledge graph in the embodiment, such as error correction words of the visual tag, associated search words of the visual tag and the like, so that the coverage rate and user experience of a search result can be greatly improved. For example, based on the error correction words, an identity card can be determined according to sfz, or an identity card can be determined according to identity positive, and the like; based on the associative search terms of the visual tag, a medical detection query code can be determined according to medical detection, a grade certificate can be determined according to grade four, an identity card image can be determined according to an identity card number, and the like.

The generated visual tag knowledge graph can be applied to a scene of searching for the additional word recommendation, and after a user inputs a search word to obtain a search result, keywords can be added to perform further searching, and most of the keywords are highly correlated with the outputted keywords, so that the additional word recommendation can be performed by means of the visual tag knowledge graph. For example, "barbecue", "hot pot" can be associated with "food" on the basis of "food; "yellow", "backpack", "satchel" can be associated with "bag"; "gym", "park" can be reminiscent of "running"; "grasslands", "dog walking", and the like can be associated with "dogs".

In addition, the present exemplary embodiment also provides another image description method, as shown in fig. 5, specifically may include the following procedures:

step S510, obtaining a visual tag knowledge graph, wherein the visual tag knowledge graph is constructed by the knowledge graph construction method;

step S520, obtaining a main label of the image to be described according to the result of identifying the image to be described;

step S530, mapping the main label by using the visual label knowledge graph to obtain a mapping label;

step S540, generating description information of the image to be described based on the main label or a label phrase formed by the main label and the mapping label.

The image to be described is an image to be used for describing image content through word segmentation, keywords or labels, and can be any image to be described, for example, in a scene of barrier-free label broadcasting, labels capable of reflecting the image content are usually extracted from the image to be broadcasted, and are broadcasted through voice or other modes, so that reading requirements are provided for special people, and the image to be broadcasted is the image to be described. The main tag refers to a part or all of the tags contained in the image to be described, for example, "person", "wedding", "skirt", "clothes", "man", "woman" and the like can be determined from the image to be described, and these can be broadcasted as the main tag in the scene of the barrier-free broadcast tag. The main tag may be various nouns as described above, and in addition, the tag may be a verb such as "walk a dog", "walk", etc. in consideration of the variety of contents included in the image. The present exemplary embodiment may obtain an initial main label of an image to be described by pre-training an image recognition model through which the image to be described is processed. In general, when an image to be described is complex, the initial main label will be a plurality of and messy labels, and at this time, the direct output may affect the judgment of the meaning actually expressed by the image to be described by the user, so that experience is poor, for example, in a scene of barrier-free broadcasting labels, the barrier-free broadcasting function for visually impaired people needs to broadcast visual labels, such as labels of people, objects, scenes or events, included in the image, and when the identified labels are a plurality of and messy labels, the user may not be able to directly or quickly acquire image information.

At this time, the main label may be processed in combination with the visual label knowledge graph obtained in step S510, so as to screen, sort, de-duplicate or score, and the like.

Specifically, after the initial image label is obtained, the main label is mapped by using the visual label knowledge graph to obtain a mapped label. Mapping tags may include inferential tags that have causal relationships with the master tag, such as "walking a dog" mapping to a "dog"; inference tags having a superior-inferior relationship with the main tag, such as a superior mapping tag "clothing" of "skirt", etc. may also be included. The obtained mapping label may be an existing main label, for example, the main label of the dog is already identified in image identification, but the label dog is also inferred based on dog walking in mapping; the mapped tag may also be a new master tag, which may be kept ready for use. When the inference mapping is performed, the determined labels of the upper and lower level relation of each main label and related information can be configured in the attribute of the main label so as to facilitate subsequent use or attribute perfection of the labels, for example, each main label corresponds to the label and can comprise attribute contents such as the upper level label or the lower level label or the part of speech of the label.

In the present exemplary embodiment, in order to ensure the rationality of the output tag, the user may easily understand that the main tag may be replaced with a user-friendly tag after it is obtained, for example, when "poodle" is determined, since the user commonly calls it as "tad dog", it may be replaced with "tad dog". And then deducing the edge attribute of the edge in the visual tag knowledge graph to obtain the mapping tag of the main tag.

Finally, the description information of the image to be described can be generated based on the main label or a label phrase formed by the main label and the mapping label, for example, when the mapping label is an upper label of the existing main label, the existing main label can be updated according to the mapping label, and the updated main label is used as the description information of the image to be described; when the mapping label is different from the existing main label and is a new label, the main label and the mapping label can be used as description information of the image to be described; and the description information of the image to be described can be generated based on the label phrase formed by the main label and the mapping label.

In the scene of barrier-free broadcasting labels, broadcasting can be performed based on description information of images to be described, for example, in a label phrase formed by a main label and a mapping label, only the main label is broadcasted, so that the situation that a plurality of similar labels are broadcasted and broadcasting effects are disordered is avoided.

In summary, in the present exemplary embodiment, a visual tag knowledge graph is obtained, and the visual tag knowledge graph is constructed by the knowledge graph construction method described above; according to the result of identifying the image to be described, obtaining a main label of the image to be described; mapping the main label by using the visual label knowledge graph to obtain a mapping label; and generating description information of the image to be described based on the main label or a label phrase formed by the main label and the mapping label. The present exemplary embodiment provides an image description method, which can infer a mapping label of a main label through a logical relationship between labels in a constructed visual label knowledge graph, and determine description information based on the main label or the main label and the mapping label.

In an exemplary embodiment, the step S540 may include:

and acquiring sentence components of the main tag or the mapping tag from the visual tag knowledge graph, and if the sentence components are the main tag or the mapping tag of the predicate, sequencing the main tag or the tag phrase according to the sentence components of the main tag or the mapping tag, and generating description information based on the sequenced main tag or tag phrase.

In general, when image description is performed, the description tag may include various components, such as verbs, nouns, etc., and different components may constitute different sentence components, such as subjects, predicates, objects, etc. For example, the description of an image may be "girl reads on lawn," where "girl" is the subject, "lawn" is the object, "see" is the predicate, "book" is the object, and so on. The present exemplary embodiment may obtain sentence components of the main tag or the mapped tag, and sort them based on the sentence components to generate description information, for example, when the main tag or the mapped tag contains predicates, the main tag or the tag phrase may be sorted according to the order of the main tag, the predicates, and other components, and when the main tag or the mapped tag does not contain predicates, the main tag or the mapped tag may be sorted according to a random order, or according to a subjective scoring order from high to low, and so on. In the scene of barrier-free broadcasting labels, label broadcasting can be performed based on the ordered description information, so that the logic of label broadcasting is ensured, and a user can understand image information conveniently.

In an exemplary embodiment, the edge attribute of the edge of the visual tag knowledge graph includes an hypernym; the mapping labels comprise first-level mapping labels; the step S520 may include:

Searching the outgoing edge of the main label in the visual label knowledge graph, wherein the edge attribute comprises the upper sense word, and taking the node corresponding to the outgoing edge as a primary mapping label of the main label.

In the visual tag knowledge graph, some edges between nodes exist in directions, for example, the edges with upper and lower relationships can have directions, when the edges from the node A to the node B are directed, the edges connecting the two nodes are the outgoing edges of the node A, and the incoming edges of the node B. In this exemplary embodiment, the primary label may be searched in the visual label knowledge graph, and the edge attribute includes the outgoing edge of the hypernym, and the node corresponding to the outgoing edge is used as the primary mapping label of the primary label, that is, according to the relationship between the edges of the nodes, the node having the primary relationship with the node is searched, and the primary mapping label may be considered as the hypernym of the current primary label.

Considering different hierarchical relationships, the master tag may include a plurality of hypernyms, and in an exemplary embodiment, the mapping tag may further include a two-level mapping tag to an N-level mapping tag, where N is a positive integer not less than 2; the step S520 may further include:

and mapping each level of mapping label of the main label by utilizing the visual label knowledge graph in sequence from the level of mapping label of the main label to obtain the next level of mapping label of the main label.

That is, in this exemplary embodiment, the multi-level mapping tag of the current main tag may be found in the visual knowledge graph, which level of mapping tag is specifically obtained may be set according to actual needs, and the multi-level mapping tag may be continuous, for example, the first-level mapping tag of the "wedding" is "dress", the second-level mapping tag is "skirt", the third-level mapping tag is "garment", all the upper-level mapping tags of the "wedding" may be found, or only the second-level and third-level mapping tags thereof may be selected, which is not specifically limited in this disclosure.

In an exemplary embodiment, before generating the description information of the image to be described based on the main tag or the tag phrase formed by the main tag and the mapping tag, the image description method may further include:

if the primary label does not accord with the word rule of the image description, the primary label is replaced by the primary mapping label of the primary label under the condition that the primary mapping label of the primary label accords with the word rule of the image description.

The word rule of the image description refers to a preset rule which can be output as a main label and accords with a specified word, and the situation that recognition deviation, excessive recognition or excessive recognition possibly occurs during image recognition is considered, so that the word rule of the image description can be pre-performed in the embodiment, and the rationality of the output word is ensured, namely, not all the main labels can be output to a user for image description. For example, if the image to be described identifies a main tag of "white wedding", which does not conform to the word rule of the image description, and the one-to-three-level mapping tag of "white wedding" may include ("wedding", "skirt", "clothing"), the main tag may be replaced with the one-level mapping tag of the main tag in case that the one-level mapping tag of the main tag "wedding" conforms to the word rule of the image description.

the main label is de-duplicated, and the label phrase is not de-duplicated.

Finally, the same label as the main label can be de-duplicated to ensure the validity of the output label, and the label phrase can be not de-duplicated, for example, the main label "evening dress" has a mapping label ("clothes"), the main label "wedding dress" also has the same mapping label ("clothes"), and the "evening dress" ("clothes") and the "wedding dress" ("clothes") can not be de-duplicated.

Fig. 6 shows a flowchart of an image description method in the present exemplary embodiment, which may specifically include:

step S610, obtaining a visual tag knowledge graph;

step S620, obtaining a main label of the image to be described according to the result of identifying the image to be described;

step S630, replacing the main label of the image to be described with a user-friendly label word;

step S640, mapping the main label by using the visual label knowledge graph to obtain a superior mapping label or a causal mapping label; wherein the upper mapping tags may include multi-level mapping tags, and the causal mapping tags may include tags having causal reasoning relation with the main tag, for example, "dog walking" may infer "dog";

Step S650, acquiring sentence components of the main label or the mapping label from the visual label knowledge graph, and judging the sentence components;

step S660, if the main tag or the mapping tag with the sentence component being the predicate exists, sorting the main tag or the tag phrase according to the sentence component of the main tag or the mapping tag;

step S670, if no main tag or mapping tag exists, the main tag or tag phrase ordering is determined according to subjective scoring;

step S680, de-duplication of the main label and not de-duplication of the label phrase;

step S690, generating description information of the image to be described based on the main label or the label phrase formed by the main label and the mapping label.

The following illustrates an application scenario of barrier-free tag duplication broadcasting, and for an image to be described, the main tag shown in the following table 5 may be obtained through an image recognition model:

TABLE 5

Main label	Confidence level
		Adult human	0.777
Human body	0.933
		Wedding	0.709
Wedding dress	0.846
		Skirt (skirt)	0.846
Clothes with a pair of elastic members	0.846
		Man	0.727
Women's life	0.705

The main labels are output according to random sequence, and the main labels may include mapping labels of the main labels, such as 'wedding dress', 'skirt', 'clothes', and the like, and the main labels and the mapping labels have the same confidence.

When the main label is processed by the image description method, the main label and the label phrase after processing can be further obtained by sorting sentence components, subjective scoring and priority order of label confidence. When the sentence component includes predicates, the ranking may be performed based on the subject, the predicates, and other orders, and when the orange component does not include predicates, the ranking may be performed based on subjective scoring, and the criteria of the subjective scoring may be set as needed, for example, the scoring may be performed based on the following table 6:

TABLE 6

Sentence component	Scoring
		Subject 1 (map 1, map 2)	1
Subject 2	2
		Predicate 1	3
Dynamic guest phrase 1	4
		Object 1 (map 3)	5
Object 2	6
		Zhuang Yuan 1	7

The results of table 7 below can be obtained after processing the main label of table 5 above:

TABLE 7

Main tag or tag phrase	Confidence level
		Men (person)	0.727
Women (human)	0.705
		Adult (human)	0.777
Wedding dress (skirt, clothes)	0.846
		Wedding	0.709

The "person" can be used as the upper mapping words of "man", "woman" and "adult", and can form tag word groups with "man", "woman" and "adult", and the "skirt" and "clothes" can be used as the upper mapping words of "wedding", and can form tag word groups with "wedding", and the mapping tags can be supplemented after the main tags, and are not listed separately. Further, the voice broadcasting of the main tag can be performed according to the ordered sequence. It should be noted that, the tag phrase may only broadcast the main tag of the non-mapping word, for example, only broadcast "man", "woman", "adult", "wedding", and not broadcast "skirt", "clothes", so it can be seen that in the scene, by integrating the mapping tag with the main tag, the fine-grained main tag is output as the broadcasting tag, and not output the upper mapping tag, so that accuracy and effectiveness of information description in image broadcasting can be further ensured.

The exemplary embodiment of the disclosure also provides a knowledge graph construction device. As shown in fig. 7, the knowledge graph construction apparatus 700 may include: a tag acquisition module 710 for acquiring a plurality of visual tags and determining a relationship between the visual tags; a graph generation module 720 for generating nodes based on the visual labels, and generating edges between the nodes based on the relation between the visual labels to construct a visual label knowledge graph; the attribute adding module 730 is configured to add node attributes to nodes according to features of the visual labels, and add edge attributes to edges according to relationship types between the visual labels.

In an exemplary embodiment, the tag acquisition module includes: a tag acquisition unit configured to acquire a candidate tag including at least one of: the method comprises the steps of demand labels in visual business, labels supported by an image classification function, search words meeting preset search conditions in image search, words obtained by word segmentation of image description sentences; and the label screening unit is used for screening visual labels from the candidate labels.

In an exemplary embodiment, the tag screening unit is configured to perform at least one of the following screening modes: filtering candidate labels without visual features from the candidate labels; determining visual distinction between different candidate labels, and if the visual distinction between a plurality of candidate labels is lower than a visual distinction threshold, de-duplicating the plurality of candidate labels; filtering candidate labels with the commonness lower than a preset commonness threshold value from the candidate labels; candidate tags that do not have a particular purpose are filtered out of the candidate tags.

In an exemplary embodiment, the features of the visual tag include: category definition of a visual tag, an example image of the visual tag, a visual tag score, visual tag visibility, user use frequency of the visual tag, foreign language name of the visual tag, scene of the visual tag, sentence component to which the visual tag belongs, part of speech of the visual tag, error correction word of the visual tag, synonym of the visual tag, near meaning word of the visual tag, association search word of the visual tag; an attribute adding module comprising: and the node attribute adding unit is used for adding one or more of the characteristics of the visual label into the associated data of the node to serve as the node attribute of the node.

In an exemplary embodiment, the knowledge graph construction apparatus further includes: a category definition determining unit for identifying an object in the image and determining a duty ratio of the object in the image; and if the duty ratio of the object in the image exceeds a preset threshold, determining the category definition of the visual tag according to the object.

In an exemplary embodiment, the relationship types include: the method comprises the steps of upper sense words, lower sense words, related words, reasoning mapping words and similar labels; an attribute adding module comprising: and the edge attribute adding unit is used for determining the class information of the edge between the two nodes corresponding to the two visual labels according to the relationship type of the relationship between the two visual labels, and adding the class information of the edge to the edge attribute of the edge.

In an exemplary embodiment, the edge attributes further include tag conditional probabilities; the knowledge graph construction device further comprises: and the conditional probability determining unit is used for counting the conditional probability that one of the two visual labels appears and the other visual label appears, obtaining the label conditional probability of the two visual labels, and adding the label conditional probability into the edge attribute of the edge between the two nodes corresponding to the two visual labels.

Exemplary embodiments of the present disclosure also provide an image description apparatus, as shown in fig. 8, the image description apparatus 800 may include: the knowledge graph acquisition module 810 is configured to acquire a knowledge graph of a visual tag, where the knowledge graph of the visual tag is constructed by the knowledge graph construction method; a main label obtaining module 820, configured to obtain a main label of the image to be described according to the result of identifying the image to be described; the mapping tag obtaining module 830 is configured to map the main tag by using the visual tag knowledge graph to obtain a mapping tag; the description information obtaining module 840 is configured to generate description information of the image to be described based on the main label or a label phrase formed by the main label and the mapping label.

In an exemplary embodiment, the node attributes of the nodes of the visual tag knowledge-graph include sentence components; the descriptive information acquisition module comprises: the sorting unit is used for obtaining sentence components of the main tag or the mapping tag from the visual tag knowledge graph, sorting the main tag or the tag phrase according to the sentence components of the main tag or the mapping tag if the sentence components are the main tag or the mapping tag of the predicate, and generating description information based on the sorted main tag or tag phrase.

In an exemplary embodiment, the edge attribute of the edge of the visual tag knowledge graph includes an hypernym; the mapping labels comprise first-level mapping labels; the mapping tag acquisition module comprises: the first mapping unit is used for searching the outgoing edge of the main label in the visual label knowledge graph, wherein the edge attribute comprises the upper sense word, and the node corresponding to the outgoing edge is used as a primary mapping label of the main label.

In an exemplary embodiment, the mapping tag further includes a two-level mapping tag to an N-level mapping tag, N being a positive integer not less than 2; the mapping tag obtaining module further includes: and the second mapping unit is used for mapping each level of mapping label of the main label by utilizing the visual label knowledge graph in sequence from the level of mapping label of the main label to obtain the next level of mapping label of the main label.

In an exemplary embodiment, the image description apparatus further includes: the replacing unit is used for replacing the main label with the first-level mapping label of the main label under the condition that the first-level mapping label of the main label accords with the word rule of the image description if the main label does not accord with the word rule of the image description before generating the description information of the image to be described based on the main label or the label phrase formed by the main label and the mapping label.

In an exemplary embodiment, the image description apparatus further includes: and the de-duplication unit is used for de-duplicating the main label and not de-duplicating the label phrase before generating the description information of the image to be described based on the main label or the label phrase formed by the main label and the mapping label.

The specific details of each part in the above apparatus are already described in the method part embodiments, and thus will not be repeated.

Exemplary embodiments of the present disclosure also provide a computer readable storage medium, which may be implemented in the form of a program product, comprising program code for causing a terminal device to perform the steps according to the various exemplary embodiments of the present disclosure as described in the above section of the "exemplary method" of the present disclosure, when the program product is run on the terminal device, e.g. any one or more of the steps of fig. 2, 5 or 6 may be performed. The program product may employ a portable compact disc read-only memory (CD-ROM) and comprise program code and may be run on a terminal device, such as a personal computer. However, the program product of the present disclosure is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, a random access memory, a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The computer readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).

Exemplary embodiments of the present disclosure also provide an electronic device. The electronic device may include a processor and a memory for storing executable instructions of the processor, the processor being configured to perform the above-described knowledge graph construction method and image description method via execution of the executable instructions.

The configuration of the electronic device will be exemplarily described below using the mobile terminal 900 of fig. 9 as an example. It will be appreciated by those skilled in the art that the configuration of fig. 9 can also be applied to stationary type devices in addition to components specifically for mobile purposes.

As shown in fig. 9, the mobile terminal 900 may specifically include: processor 901, memory 902, bus 903, mobile communication module 904, antenna 1, wireless communication module 905, antenna 2, display 906, camera module 907, audio module 908, power module 909, and sensor module 910.

The processor 901 may include one or more processing units, such as: the processor 901 may include an AP (Application Processor ), modem processor, GPU (Graphics Processing Unit, graphics processor), ISP (Image Signal Processor ), controller, encoder, decoder, DSP (Digital Signal Processor ), baseband processor and/or NPU (Neural-Network Processing Unit, neural network processor), and the like.

An encoder may encode (i.e., compress) an image or video to reduce the data size for storage or transmission. The decoder may decode (i.e., decompress) the encoded data of the image or video to recover the image or video data. The mobile terminal 900 may support one or more encoders and decoders, for example: image formats such as JPEG (Joint Photographic Experts Group ), PNG (Portable Network Graphics, portable network graphics), BMP (Bitmap), and video formats such as MPEG (Moving Picture Experts Group ) 1, MPEG10, h.1063, h.1064, HEVC (High Efficiency Video Coding ).

The processor 901 may form a connection with the memory 902 or other components via the bus 903.

Memory 902 may be used to store computer-executable program code that includes instructions. The processor 901 performs various functional applications of the mobile terminal 900 and data processing by executing instructions stored in the memory 902. The memory 902 may also store application data, such as files that store images, videos, and the like.

The communication functions of the mobile terminal 900 may be implemented by the mobile communication module 904, the antenna 1, the wireless communication module 905, the antenna 2, a modem processor, a baseband processor, and the like. The antennas 1 and 2 are used for transmitting and receiving electromagnetic wave signals. The mobile communication module 904 may provide a mobile communication solution of 3G, 4G, 5G, etc. applied on the mobile terminal 900. The wireless communication module 905 may provide wireless communication solutions for wireless local area networks, bluetooth, near field communications, etc. applied on the mobile terminal 900.

The display screen 906 is used to implement display functions such as displaying user interfaces, images, video, and the like. The image capturing module 907 is used to implement capturing functions, such as capturing images, videos, and the like. The audio module 908 is used to implement audio functions such as playing audio, capturing speech, etc. The power module 909 is configured to perform power management functions such as charging a battery, powering a device, monitoring a battery status, and the like. The sensor module 910 may include one or more sensors for implementing corresponding sensing functions. For example, the sensor module 910 may include an inertial sensor for detecting a motion pose of the mobile terminal 900 and outputting inertial sensing data.

Those skilled in the art will appreciate that the various aspects of the present disclosure may be implemented as a system, method, or program product. Accordingly, various aspects of the disclosure may be embodied in the following forms, namely: an entirely hardware embodiment, an entirely software embodiment (including firmware, micro-code, etc.) or an embodiment combining hardware and software aspects may be referred to herein as a "circuit," module "or" system. Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any adaptations, uses, or adaptations of the disclosure following the general principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. The knowledge graph construction method is characterized by comprising the following steps of:

acquiring a plurality of visual labels and determining the relation among the visual labels;

generating nodes based on the visual labels, and generating edges between the nodes based on the relation between the visual labels so as to construct a visual label knowledge graph;

and adding node attributes to the nodes according to the characteristics of the visual labels, and adding edge attributes to the edges according to the relation types among the visual labels.

2. The method of claim 1, wherein the obtaining a plurality of visual tags comprises:

obtaining a candidate tag, the candidate tag comprising at least one of: the method comprises the steps of demand labels in visual business, labels supported by an image classification function, search words meeting preset search conditions in image search, words obtained by word segmentation of image description sentences;

and screening the visual labels from the candidate labels.

3. The method of claim 2, wherein said screening out said visual labels from said candidate labels comprises at least one of:

filtering candidate labels without visual features from the candidate labels;

candidate tags that do not have a particular use are filtered out of the candidate tags.

4. The method of claim 1, wherein the features of the visual tag comprise: category definition of a visual tag, an example image of the visual tag, a visual tag score, visual tag visibility, user use frequency of the visual tag, foreign language name of the visual tag, scene of the visual tag, sentence component to which the visual tag belongs, part of speech of the visual tag, error correction word of the visual tag, synonym of the visual tag, near meaning word of the visual tag, association search word of the visual tag;

the adding node attributes to the nodes according to the features of the visual tag includes:

5. The method according to claim 4, wherein the method further comprises:

Identifying an object in an image and determining a duty cycle of the object in the image;

6. The method of claim 1, wherein the relationship type comprises: the method comprises the steps of upper sense words, lower sense words, related words, reasoning mapping words and similar labels;

the adding the edge attribute to the edge according to the relation type between the visual labels comprises the following steps:

7. The method of claim 1, wherein the edge attributes further comprise label conditional probabilities; the method further comprises the steps of:

and counting the conditional probability that one of the two visual labels appears and the other visual label appears to obtain the label conditional probability of the two visual labels, and adding the label conditional probability into the edge attribute of the edge between the two nodes corresponding to the two visual labels.

8. An image description method, comprising:

acquiring a visual tag knowledge graph constructed by the knowledge graph construction method of any one of claims 1 to 7;

obtaining a main label of the image to be described according to the result of identifying the image to be described;

mapping the main label by using the visual label knowledge graph to obtain a mapping label;

and generating the description information of the image to be described based on the main label or a label phrase formed by the main label and the mapping label.

9. The method of claim 8, wherein the node attributes of the nodes of the visual tag knowledge-graph comprise sentence components; the generating the description information of the image to be described based on the main label or the label phrase formed by the main label and the mapping label includes:

and acquiring sentence components of the main tag or the mapping tag from the visual tag knowledge graph, and if the main tag or the mapping tag with the sentence components being predicates exists, sequencing the main tag or the tag phrase according to the sentence components of the main tag or the mapping tag, and generating the description information based on the sequenced main tag or tag phrase.

10. The method of claim 8, wherein the edge attribute of the edge of the visual tag knowledge-graph comprises an hypernym; the mapping tag comprises a first-level mapping tag; the mapping the main label by using the visual label knowledge graph to obtain a mapping label comprises the following steps:

searching the outgoing edge of the main label in the visual label knowledge graph, wherein the edge attribute comprises an upper sense word, and taking a node corresponding to the outgoing edge as a primary mapping label of the main label.

11. The method of claim 10, wherein the mapping tags further comprise a two-level mapping tag to an N-level mapping tag, N being a positive integer not less than 2; the mapping is performed on the main label by using the visual label knowledge graph to obtain a mapping label, and the method further comprises the following steps:

and mapping each level of mapping label of the main label by using the visual label knowledge graph from the level of mapping label of the main label to obtain the next level of mapping label of the main label.

12. The method according to claim 10, wherein before generating the description information of the image to be described based on the main label or a label phrase formed by the main label and the mapping label, the method further comprises:

If the primary label does not accord with the word rule of the image description, the primary mapping label of the primary label is used for replacing the primary label under the condition that the primary mapping label of the primary label accords with the word rule of the image description.

13. The method according to claim 10, wherein before generating the description information of the image to be described based on the main label or a label phrase formed by the main label and the mapping label, the method further comprises:

and de-duplicating the main label and not de-duplicating the label phrase.

14. The knowledge graph construction device is characterized by comprising:

the label acquisition module is used for acquiring a plurality of visual labels and determining the relation among the visual labels;

the map generation module is used for generating nodes based on the visual labels, and generating edges between the nodes based on the relation between the visual labels so as to construct a visual label knowledge map;

and the attribute adding module is used for adding node attributes to the nodes according to the characteristics of the visual labels and adding edge attributes to the edges according to the relationship types among the visual labels.

15. An image description apparatus, comprising:

A knowledge graph acquisition module for acquiring a visual tag knowledge graph constructed by the knowledge graph construction method according to any one of claims 1 to 7;

the main label acquisition module is used for acquiring a main label of the image to be described according to the result of identifying the image to be described;

the mapping tag acquisition module is used for mapping the main tag by using the visual tag knowledge graph to obtain a mapping tag;

and the descriptive information acquisition module is used for generating descriptive information of the image to be described based on the main label or a label phrase formed by the main label and the mapping label.

16. A computer-readable storage medium, on which a computer program is stored, characterized in that the computer program, when executed by a processor, implements the knowledge-graph construction method of any one of claims 1 to 7 and the image description method of any one of claims 8 to 13.

17. An electronic device, comprising:

a processor;

a memory for storing executable instructions of the processor;

wherein the processor is configured to perform the knowledge-graph construction method of any one of claims 1 to 7 and the image description method of any one of claims 8 to 13 via execution of the executable instructions.