CN108959304B

CN108959304B - Label prediction method and device

Info

Publication number: CN108959304B
Application number: CN201710363676.0A
Authority: CN
Inventors: 魏溪含
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2017-05-22
Filing date: 2017-05-22
Publication date: 2022-03-25
Anticipated expiration: 2037-05-22
Also published as: CN108959304A

Abstract

The embodiment of the application discloses a label prediction method and a label prediction device. The method comprises the following steps: acquiring at least one image dataset, wherein images in the image datasets belong to the same category; respectively performing label prediction on the images in the image data set within a preset image label range to generate at least one prediction label of each image; and respectively counting the occurrence frequency of each prediction label, acquiring the image corresponding to the prediction label with the frequency meeting the preset condition, and setting the label of the image data set to which the image belongs as the prediction label. By utilizing the embodiments of the application, the accuracy degree and the merging efficiency of label prediction can be improved.

Description

Label prediction method and device

Technical Field

The present application relates to the field of information processing technologies, and in particular, to a method and an apparatus for label prediction.

Background

In recent years, with the rapid development of science and technology, people have more and more demands on intelligent live production. The 'searching for pictures with pictures' and 'searching for characters with pictures' are probably not strange to users, and pictures which belong to the same type or are similar to the input pictures can be searched according to the pictures input by the users on a plurality of online shopping platforms, search platforms and the like, and even contents displayed in the pictures can be matched. In some examples, a picture similar to the input picture of the cat is obtained according to the picture search of the cat input by the user, or information such as the variety of the cat in the picture is obtained.

In order to ensure that a user can search pictures or characters related to input pictures on a platform, a platform capable of providing services such as 'searching pictures with pictures' and 'searching characters with pictures' is often required to have massive picture data resources. In the process of constructing a picture data resource, a label is often required to be marked on a picture, and the label can represent the category of the picture in the data resource so as to better manage the picture resource, wherein the label is used for example, the label is 'english short', 'gardenia', 'keyboard', and the like. For the service platform, it is certainly desirable that the number of pictures under each tag is as large as possible, and therefore, the relevant service platform needs to collect pictures from other picture data resources and expand the pictures to its own picture data resource. In the expansion process, pictures in other picture data resources also include tag information, but setting rules of picture tags on different platforms are different. For example, the tag language and the target language of the picture on the foreign data platform are different, and the translation software is used for translating the tag language into the target language, so that the phenomena of word ambiguity, word sense ambiguity and the like can be found. The above phenomena result in some picture tags not being incorporated into the existing picture data resources on the platform. For example, the Google open image includes a plurality of pictures labeled "communications", and if the target language is chinese and "communications" is translated into chinese, various expressions such as "comic", "comic book", and "comic character" may be included. If the existing picture data platform comprises three labels of cartoon, cartoon book and cartoon figure, the problem that the "communications" is not definitely combined to the label in the prior art occurs.

In order to solve the above problems, in the prior art, it is often determined whether the picture label can be merged with the existing label by a manual observation method. If a plurality of pictures with labels of "comics" in the Google open image can be opened, whether the "comics" belongs to the cartoon or the cartoon and the cartoon character can be manually checked. The manual observation mode has the advantages of large workload and low working efficiency.

Therefore, a need exists in the art for a more accurate and intelligent image tag merging method.

Disclosure of Invention

The embodiment of the application aims to provide a label prediction method and a label prediction device, which can improve the accuracy and merging efficiency of label prediction.

The method and the device for label prediction provided by the embodiment of the application are specifically realized as follows:

a label prediction method, comprising:

acquiring at least one image dataset, wherein images in the image datasets belong to the same category;

respectively performing label prediction on the images in the image data set within a preset image label range to generate at least one prediction label of each image;

and respectively counting the occurrence frequency of each prediction label, acquiring the image corresponding to the prediction label with the frequency meeting the preset condition, and setting the label of the image data set to which the image belongs as the prediction label.

A label prediction method, comprising:

and respectively counting the number of the images corresponding to the prediction labels, and setting the label of the image data set to which the image belongs as the prediction label when the number meets a preset condition.

A label prediction method, comprising:

acquiring a plurality of images belonging to the same category;

performing label prediction on the plurality of images by using a prediction model, and generating at least one prediction label for each image;

and counting the occurrence times of a single prediction label, and taking the prediction label with the occurrence times meeting a preset condition as a recommendation label of the plurality of images.

A tag prediction apparatus comprising a processor and a memory for storing processor-executable instructions, the instructions when executed by the processor result in:

acquiring a plurality of images belonging to the same category;

A computer readable storage medium having stored thereon computer instructions that, when executed, perform the steps of:

According to the label prediction method and device, image data sets with the same category can be merged into an initial image data source, label prediction is firstly carried out on images in the image data sets in the data merging process, and predicted labels belong to label ranges in the initial image data source. And determining whether to combine the corresponding image data sets or not according to the number of the prediction labels. In this embodiment, considering from the opposite direction to the conventional art, it is equivalent to first determining whether the images belong to the same category and then merging the labels corresponding to the image data sets. By the aid of the method, accuracy of image data combination can be improved, and in addition, data combination is performed by the aid of a label setting mode, rapid migration of large-scale image data can be achieved, and combination efficiency of the large-scale data is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without any creative effort.

FIG. 1 is a schematic diagram of an application scenario provided herein;

FIG. 2 is a schematic diagram of an application scenario provided herein;

FIG. 3 is a schematic method flow diagram of one embodiment of a label prediction method provided herein;

FIG. 4 is an exemplary schematic diagram of a new image data source merged with an initial image data source as provided herein;

FIG. 5 is a schematic diagram of a BP network topology provided by the present application;

FIG. 6 is a schematic diagram of an application scenario provided herein;

FIG. 7 is a schematic diagram of an application scenario provided herein;

FIG. 8 is a schematic method flow diagram of another embodiment of a tag prediction method provided herein;

FIG. 9 is a schematic method flow diagram of another embodiment of a tag prediction method provided herein;

fig. 10 is a schematic block diagram of an embodiment of a tag prediction apparatus provided in the present application.

Detailed Description

In order to make those skilled in the art better understand the technical solutions in the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

For the convenience of those skilled in the art to understand the technical solutions provided in the embodiments of the present application, a technical environment for implementing the technical solutions is described below.

At present, many search engines can input not only text and voice but also information such as images. The development of search engines has been closely related to the needs of users, and it is desirable to know the information of a word through the search engine when the user sees or hears the word, and later to obtain the information related to music through the search engine when the user hears the music. Both of the above-mentioned user demands have been fulfilled, and today, users want to be able to search for relevant information about an arbitrary picture seen in a search engine. In some typical scenarios, a user is attracted to a snowy mountain beauty on a poster but does not know where the user is, and then the user takes a picture of the snowy mountain beauty shown on the poster and inputs the picture into a search engine supporting picture search, hoping that the location of the snowy mountain beauty can be searched through the search engine. If the snowy mountain beautiful scenery is actually located in a town of switzerland, but the background database of the search engine used by the user does not have the snowy mountain image resource of the town of switzerland, the search engine is likely to fail to acquire the location of the snowy mountain beautiful scenery, and even may output the information that the snowy mountain beautiful scenery is located in hot scenic spots such as the mountains in Xinjiang and the like by mistake. Similarly, there are many situations in which the search engine cannot recognize the relevant information of the image input by the user, or even recognize the wrong information, due to the shortage of the background image resource of the search engine. The wrong information plays a negative guiding role for the user, and the service quality of the whole search engine can be negatively evaluated by the user, and even the user trust crisis is generated.

In order to solve the above problems, the background database of the image search engine needs to be continuously expanded with image resources. As described above, in the process of expanding image resources, the problem of non-uniform expression of image tags may occur, so that image data resources on other platforms cannot be accurately merged into image data resources of a search engine. In the prior art, whether the image labels of different expression modes have the same meaning is judged in a manual observation mode, but the mode needs to consume more cost and has lower efficiency and cannot keep pace with the image resource updating speed.

Based on the actual technical requirements similar to those described above, the label prediction method provided by the application can perform deep learning on the existing image resources in the search engine, and train and generate a model based on the relationship between the label and the image. After that, after the image data on other platforms is input into the model, the label corresponding to the image data can be predicted. According to the number of the images corresponding to the predicted labels, whether to combine two labels of the images can be determined, namely, image resources on two platforms are combined.

The technical scheme provided by the embodiment of the application can be applied to various platforms, and a plurality of application platforms of the technical scheme are simply introduced below.

A simple application platform such as an image search service platform providing image search. On the service platform, a user can input any image in a search engine provided by the service platform, and the image search service platform can match an image similar to the input image or related information of the input image according to the image input by the user. For example, for the example in the technical environment described above, the user enters a shot photograph of a snowy mountain beauty on a poster into a search engine, which may then match up information about the photograph immediately. As shown in fig. 1, the related information may be shown in the form of a tag, and for the snowscape photograph shown in fig. 1, the search engine may output the following tags: "swiss", "snow mountain", "alps", "charles peak", "winter", "sky", and the like. Of course, the related information may also be presented in the form of sentences and the like, and the application is not limited herein. By using the technical scheme of the application, mass image resources can be rapidly and accurately expanded into a background database of a search engine, and the relevant information of the input image can be accurately searched by a user under the condition that the input image is searched.

Another application scenario is user photo management. With the rapid development of cloud technology, the amount of data generated in the cloud is increasing day by day, and the personal photo of the user is an important component. In the prior art, management modes for personal albums exist no matter in client equipment or a cloud end of a user, but the management modes are simple, and management is performed according to a macroscopic and rough classification mode such as time, people and places. By using the technical scheme provided by the application, the existing scene type images can be deeply learned, and a relation model between the scene type labels and the images under the labels is generated. Based on the relation model, the pictures and the photos of the user can be classified into scenes. FIG. 2 is a schematic diagram of a user interface for managing user albums according to scene categories according to the present application. As shown in fig. 2, according to the technical solution of the present application, the photos of the user can be divided into a plurality of large categories such as indoor, outdoor, and caricatures, and can be further divided into a plurality of sub-categories under each large category, for example, the indoor scenes can be further divided into a plurality of scenes such as a residence, an office, a coffee shop, and a market, and similarly, the outdoor scenes can be further divided into a plurality of scenes such as mountaineering, a sea, and a garden. Based on the classification of the scene, the user is provided with great convenience for searching photos, for example, the user needs to search a photo played by the seaside in the last two years, but specific time is forgotten. If the user needs to open the photo album before a long time according to the existing photo album management method, the user can find the photo only after browsing for a long time, and the operation is very inconvenient for the user. If the personal photo album is managed according to the scenes as shown in fig. 2, the user can quickly find the corresponding photo by just remembering the rough scene, such as "sea", and the search efficiency is greatly improved.

The classification method of the scenes is not limited to the above example, and each subclass may be directly used as a main class, and the classes may be arbitrarily combined, which is not limited herein.

The technical scheme of the application can also be applied to application scenes such as interesting image recognition, security and protection, and the application is not limited in the setting of the application scenes. In summary, the implementation of the application scenario is based on the technical scheme provided by the present application, that is, mass image data resources are quickly and accurately merged into existing data resources, so that the existing image data resources are richer, and a data basis for providing various image service bases is provided.

The label prediction method described in the present application is described below with reference to fig. 3. Fig. 3 is a schematic method flow diagram of an embodiment of a tag prediction method provided in the present application. Although the present application provides method steps as shown in the following examples or figures, more or fewer steps may be included in the method based on conventional or non-inventive efforts. In the case of steps where no necessary causal relationship exists logically, the order of execution of the steps is not limited to that provided by the embodiments of the present application. The method can be executed sequentially or in parallel (for example, in the context of parallel processors or multi-thread processing) in the method shown in the embodiment or the figures in the actual tag prediction process.

In particular, an embodiment of the label prediction method provided in the present application is shown in fig. 3, where the method may include:

s31: at least one image dataset is acquired, the images in the image datasets belonging to the same category.

In this embodiment, the at least one image dataset may serve as an image data source, the images in the image datasets belonging to the same category. The image may include not only still images such as photographs and pictures, but also moving images such as Gif. In some embodiments, the image dataset may be derived from an image database such as a Google open image database, MIT scene data, Imagenet data, and the like, wherein the images contained in the image database each have an image tag. The image tag may be used to describe key features of the corresponding image, and a specific form may include at least one phrase, vocabulary, and the like, in the image data, the image tag may be used to access the image corresponding thereto.

In this embodiment, since the images in the image dataset belong to the same category, the images in the image dataset may have at least one same image tag. Such as "english short", "gardenia", "keyboard", etc. Of course, in other embodiments, the images in the image dataset may not have image tags, but it may be determined that the images belong to the same category. For example, based on the source information of a group of photographs, it may be determined that the group of photographs were taken of the same object (e.g., a model, sunrise, etc.) during consecutive time periods at the same location, and it may be determined that the group of photographs are images having the same category. Therefore, in this embodiment, it can be determined whether the images in the image data set belong to the same category not only through the image tag information, but also through other identification information, and the application is not limited herein.

The image data set is not limited to the large database disclosed above, but may include image data resources created by other users. If the user agrees, a personal photo album of the user is obtained, and the photo album contains classified photo sets established by the user. The present application is not limited herein as to the source of the image dataset.

S32: and respectively performing label prediction on the images in the image data set within a preset image label range to generate at least one prediction label of each image.

In this embodiment of the present application, the at least one image data set may be merged into an existing initial image data source, the initial image data source may have a preset image tag range, that is, the initial image data source may include a plurality of determined tags, and the merging of the image data set as a new image data source with the initial image data source is: and selecting an image label matched with the image in the image data set from the preset image label range. Fig. 4 is an exemplary schematic diagram of merging a new image data source (i.e., the at least one image data set) with an initial image data source provided by the present application, as shown in fig. 4, in the initial image data source, image data sets under several labels of "sky", "sea", "mall", "bar", etc. may be included, and in the new image data source, image data sets under labels of "アニメ", "shopping", "beach", "Pubs", etc. may be included, it can be found that image labels in the new image data source may be expressions of multiple languages, and from a literal meaning, it is possible that "shopping" and "mall", "sea" and "beach", "bar" and "Pubs" belong to the same category, i.e., labels of two image data sets may be merged, but there are many uncertain factors. For example, the image dataset corresponding to "shopping" may be a supermarket shopping scene, and the image dataset corresponding to "shopping mall" may be a shopping mall shopping scene such as clothes, shoes, hats, jewelry, etc., and it is obviously not appropriate to merge the two image datasets. Therefore, the tags are simply combined according to the literal meaning of the tags, and errors are likely to occur.

In this embodiment, the initial image data source is established in advance, and in the establishment process, a required image tag may be determined in advance, and in the selection of the image tag, the setting may be performed according to actual business requirements. For example, for business requirements for classifying user albums according to scene classification, scene class labels, such as "office", "residence", "mall", "movie theater", "cafe", "wedding", "cruise", and the like, may be set. After the corresponding image tags are set, the images under the image tags can be filled, specifically, the images related to the preset image tags can be searched through a search engine, and then the searched images are cleaned and screened, so that the images are matched with the image tags. And finally, printing a corresponding image label on the matched image, namely generating the initial image data source.

In this embodiment, after the initial image data source is constructed, the initial image data source may be subjected to deep learning to obtain a relationship model between an image tag and an image. The specific learning manner may include:

SS 1: image samples of a plurality of known image tags are acquired.

SS 2: and carrying out deep learning processing on the image samples of the plurality of known image labels to obtain a relation model between the image labels and the images.

In this embodiment, a plurality of image samples of known image tags may be obtained from the initial image data source, and the plurality of image samples are subjected to deep learning to obtain a relationship model between the image tags and the images. In some embodiments, the plurality of image samples may be learned using a convolutional neural network algorithm. Specifically, in the deep learning process using the convolutional neural network algorithm, an initial relationship model between the image label and the image may be set, where the relationship model takes the image sample as input data and the image label of the image sample as output data. Training parameters are set in the relation model, and the deep learning process is a process for optimizing the training parameters. In this embodiment, a large number of image samples with known image labels may be obtained from an existing image data source, and by continuously inputting the image samples and the image labels of the image samples into the relationship model, the training parameters may be continuously optimized, and the output accuracy of the relationship model may be improved until the relationship model meets a preset requirement. The preset requirement may include, for example, maximizing a preset objective function, having a model accuracy not less than a certain threshold, and the like. Of course, in other embodiments, the multiple image samples may be learned by using an automatic coding algorithm, a sparse automatic coding algorithm, a limiting boltzmann machine algorithm, and a deep confidence network algorithm, and the method of deep learning is not limited in the present application.

The following illustrates, without limitation, an example of deep learning of image samples with a convolutional neural network algorithm. The deep learning process is mainly to train a parameter theta, set an objective function of the deep learning as L (theta, D), when the objective function L (theta, D) is maximized, the parameter theta is an optimal parameter, and when theta is the optimal parameter, a relation model between an image label and an image can be obtained. The expression of the objective function L (θ, D) may include:

wherein L (θ, D) represents a likelihood function of the original image data source D on a model containing a parameter θ; theta is a training parameter required to be learned by the neural network; d represents an initial image data source; i represents the ith image in the initial image data source; x is the number of⁽ⁱ⁾A sample representation, such as a pixel gray scale value matrix, representing the ith image; y is⁽ⁱ⁾An image tag representing the ith picture; y represents the whole label set corresponding to the initial image data source; p (Y ═ Y)⁽ⁱ⁾|x⁽ⁱ⁾θ) represents the conditional probability, i.e. the image sample x input into the initial image data source with the existing parameter θ⁽ⁱ⁾Predicting the image sample x⁽ⁱ⁾Corresponding image label y⁽ⁱ⁾The probability of (c).

In this embodiment, the training parameter θ may be learned by using an Error Back Propagation (BP) algorithm. The BP algorithm may use a gradient search technique based on Delta learning rules to achieve a minimization of the mean square error of the actual output of the network from the desired output. The neural network learning process can be understood as a process of propagating backward and correcting weights, and the BP network topology is shown in fig. 5. The BP algorithm is essentially the problem of solving the minimum value of an error function, and the algorithm can adopt a steepest descent method in nonlinear programming to modify weight coefficients according to the negative gradient direction of the error function.

To illustrate the BP algorithm, an error function E is first defined. Taking the sum of the squares of the difference between the desired output and the actual output as an error function, then there is:

wherein, t_kIs unit k for training sample x⁽ⁱ⁾The target value in this embodiment is the representation of the training sample x⁽ⁱ⁾Corresponding image label y⁽ⁱ⁾，o_kRepresenting a given training sample x⁽ⁱ⁾The output value of cell k.

Subsequently, the error function E may be calculated by using a gradient descent method, and the like, and the weight coefficient of the network topology shown in fig. 5 may be modified according to the negative gradient direction of the error function.

After a relation model (i.e., a training parameter θ) between image labels and images is obtained by using depth learning similar to the above method, label prediction can be performed on the images in the image data set by using the relation model. After any image is input into the relation model, an image label corresponding to the image can be obtained through calculation. Therefore, in an embodiment of the present application, the performing label prediction on the plurality of images within a preset label range respectively may include:

and performing label prediction on the images in the image data set by using the relational model respectively to generate at least one prediction label of each image, wherein the prediction label is contained in the image label of the image sample.

In this embodiment, label prediction may be performed on the images by using the relationship model, so as to generate prediction labels of the plurality of images. The prediction labels may be one or more, for example, if one image is fireworks in the night, then the prediction labels may include two image labels of the night and fireworks, and the two image labels are both included in the image labels of the image samples participating in the deep learning.

In another embodiment of the present application, if the amount of image data in an image data set is large, the image data set may be sampled and then subjected to label prediction. Based on this, the performing label prediction on the plurality of images within the preset image label range may include:

SS-1: and sampling the plurality of images according to a preset rule.

And (4) SS-2: and respectively carrying out label prediction on the sampled images within a preset image label range.

In this embodiment, the plurality of images may be sampled according to a preset rule, where the preset rule may include, for example, when the number of the plurality of images is greater than a certain threshold. If 2000 images belonging to the label "rock climbing" are known, the calculation amount is large if label prediction is performed on the 2000 images, and since the 2000 images are known to belong to the image label "rock climbing", the 2000 images can be randomly sampled, and if 80 images are sampled for label prediction, the label prediction workload can be greatly reduced, and the label prediction efficiency can be improved.

In another embodiment of the present application, the confidence levels corresponding to the prediction labels may be calculated respectively, and whether to count the image may be selected according to the confidence levels. Based on this, the performing label prediction on the plurality of images within the preset image label range may include:

SSS 1: respectively performing label prediction on the plurality of images within a preset image label range to generate at least one prediction label of each image;

SSS 2: and respectively calculating the confidence degrees of the predicted labels.

In this embodiment, not only the label prediction may be performed on each image, but also the confidence degree corresponding to each prediction label may be respectively calculated, where the confidence degree may be used to represent the matching degree between the prediction label and the image, and when the prediction label is more matched with the image, the confidence value is higher. As shown in fig. 6, the input image is a night scene of a firework of a sydney opera house, and a plurality of prediction labels such as "fireworks", "the night scene", "the sydney opera house", "lake water", and "cruise ship" can be predicted by using the relationship model provided by the present application, but the confidence levels corresponding to the prediction labels are different, if the confidence level of the "night scene" is 97, the highest confidence level is 63, and the lowest confidence level is provided. By using the confidence degrees corresponding to the prediction labels, whether the image is included or not can be determined when the number of the images corresponding to the prediction labels is counted subsequently.

S33: and respectively counting the occurrence frequency of each prediction label, acquiring the image corresponding to the prediction label with the frequency meeting the preset condition, and setting the label of the image data set to which the image belongs as the prediction label.

In this embodiment, the number of the prediction tags may be respectively counted, and when the number satisfies a preset condition, the image corresponding to the prediction tag whose number satisfies the preset condition may be obtained, and the tag of the image data set corresponding to the image may be set as the preset tag. In an embodiment of the present application, the preset condition may include, for example, at least one of the following:

the number of times is greater than a first threshold;

the proportion of the times to the total occurrence times of all the predicted labels is greater than a second threshold value;

the number precedes at least a third threshold number in a descending order of predicted tag occurrences.

In this embodiment, when the number of the prediction tags satisfies the above condition, the image data set may be merged into the initial image data set corresponding to the prediction tags. The first predetermined condition may include, for example, that the number of the predetermined labels is greater than a first threshold, for example, label prediction is performed on 2000 images belonging to the label "rock climbing", and if the predicted label with 1800 images therein is predicted to be "rock climbing", the number of the predicted labels "rock climbing" is 1800. If the first threshold is set to 1750, the image dataset labeled "rock climb" may be merged with the image dataset labeled "climb" in the original image data source since 1800> 1750. For a second preset condition, the proportion of the number of the predicted labels to the total number of the predicted labels is greater than a second threshold value. If the prediction label of 1800 images is predicted to be "climb" from 2000 images corresponding to the label "rock climbing", if it is assumed that there is only one label per image, the prediction label "rock climbing" accounts for 1800/2000, which is 90% of the total number of the prediction labels, and if the second threshold value is set to 85%, the image dataset corresponding to the label "rock climbing" and the image dataset corresponding to "rock climbing" can be merged. For a third preset condition, the number is at least before a third threshold bit in the order of from the most to the least of the predicted tag numbers. If the label prediction is performed on the image sets of multiple categories to obtain multiple prediction labels, then the prediction labels may be sorted according to the number, for example, sorted from the top to the bottom, and if the third threshold is set to 6, the image data sets corresponding to the prediction labels sorted in the first 6 bits may be merged.

In an embodiment of the application, the obtaining the images corresponding to the prediction labels of which the number satisfies a preset condition, and setting the label of the image dataset to which the image belongs as the prediction label may include:

SSS _ 1: acquiring an image corresponding to the prediction label with the times meeting the preset conditions;

SSS _ 2: screening out images which are not less than a fourth threshold value and belong to the same category of image data sets;

SSS _ 3: and setting the label of the image data set to which the screened image belongs as the prediction label.

In this embodiment, if label prediction is performed on images in different types of image data sets, and each image may include a plurality of prediction labels, there is a possibility that the images in the different types of image data sets have the same prediction label. For example, the category of image dataset a belongs to pleasure boats, wherein the predictive label of image 1 comprises "pleasure boat", "fireworks", "night scene", while the category of image dataset B belongs to lake water, wherein the predictive label of image 2 comprises "lake water", "swan", "pleasure boat". When counting the number of prediction labels "cruise ship", it is found that most of the images corresponding to the "cruise ship" are derived from the image data set a, and of course, include the image 2 in the image data set B. If the number of the prediction labels "cruise ship" at this time meets the preset condition, the label of the image data set a may be set as "cruise ship", and the image 2 in the image data set B needs to be excluded. In order to enable few images in image data sets of different types to fall into the prediction label, images which are not less than a fourth threshold value and belong to the image data sets of the same type can be screened out from the images, and the label of the image data set to which the screened images belong is set as the prediction label, so that the label setting is more accurate and reliable.

For the above prediction labels with confidence, in an embodiment of the present application, the respectively counting the number of the images corresponding to the prediction labels may include:

SS-A: judging whether the confidence of the predicted label is greater than a preset threshold value or not;

and (4) SS-B: and if so, determining the participation frequency statistics of the prediction label.

In this embodiment, it may be determined whether the confidence of the predicted tag is greater than a preset threshold, and when the confidence is greater than the threshold, the number of the predicted tags is counted. Specifically, for example, it may be set that the prediction label corresponding to the image is counted when the confidence of the prediction label is greater than 80%. As for the image shown in fig. 6, only three prediction tags of "fireworks", "night view", "lake water" are counted, and two prediction tags of "sydney opera house", "cruise ship" are not counted.

It should be noted that the merging manner of the two image data sets may include updating a label of the image data set to be merged to a prediction label, where the prediction label is a warehousing label. Fig. 7 is a schematic diagram of merging an image data set originally labeled "baseset Hound" in the new data source Goole image library _01 into an image data set predicted to be labeled "bagighound" in the present application. The image data sets are merged by updating the labels, so that rapid migration of large-scale image data can be realized, and the data merging efficiency is improved.

According to the label prediction method, the image data sets with the same category can be merged into the initial image data source, label prediction is firstly carried out on the images in the image data sets in the data merging process, and predicted labels obtained through prediction belong to the label range in the initial image data source. And determining whether to combine the corresponding image data sets or not according to the number of the prediction labels. In this embodiment, considering from the opposite direction to the conventional art, it is equivalent to first determining whether the images belong to the same category and then merging the labels corresponding to the image data sets. By the aid of the method, accuracy of image data combination can be improved, and in addition, data combination is performed by the aid of a label setting mode, rapid migration of large-scale image data can be achieved, and combination efficiency of the large-scale data is improved.

The present application also provides another embodiment of a label prediction method, as shown in fig. 8, the method may include:

s81: acquiring at least one image dataset, wherein images in the image datasets belong to the same category;

s82: respectively performing label prediction on the images in the image data set within a preset image label range to generate at least one prediction label of each image;

s83: and respectively counting the number of the images corresponding to the prediction labels, and setting the label of the image data set to which the image belongs as the prediction label when the number meets a preset condition.

In this embodiment, reference may be made to S31 and S32 for specific implementation of S81 and S82, which are not described herein again. In this embodiment, the number of images corresponding to the prediction tag may be counted, so that after the number of images is counted, the tag of the image data set of the image may be directly set.

The present application also provides another embodiment of a label prediction method, as shown in fig. 9, the method may include:

s91: acquiring a plurality of images belonging to the same category;

s92: performing label prediction on the plurality of images by using a prediction model, and generating at least one prediction label for each image;

s93: and counting the occurrence times of a single prediction label, and taking the prediction label with the occurrence times meeting a preset condition as a recommendation label of the plurality of images.

Fig. 10 is a block diagram of an embodiment of a tag prediction apparatus provided in the present application, and as shown in fig. 10, the apparatus includes a processor and a memory for storing processor-executable instructions, and the processor, when executing the instructions, may implement:

The label prediction device provided by the application can merge image data sets with the same category into an initial image data source, firstly, label prediction is carried out on images in the image data sets in the data merging process, and predicted labels obtained through prediction belong to the label range in the initial image data source. And determining whether to combine the corresponding image data sets or not according to the number of the prediction labels. In this embodiment, considering from the opposite direction to the conventional art, it is equivalent to first determining whether the images belong to the same category and then merging the labels corresponding to the image data sets. By the aid of the method, accuracy of image data combination can be improved, and in addition, data combination is performed by the aid of a label setting mode, rapid migration of large-scale image data can be achieved, and combination efficiency of the large-scale data is improved.

Optionally, in an embodiment of the application, before the performing step performs label prediction on the images in the image data set within a preset image label range, the processor may further perform:

acquiring image samples of a plurality of known image labels;

and carrying out deep learning processing on the image samples of the plurality of known image labels to obtain a relation model between the image labels and the images.

Optionally, in an embodiment of the present application, when the implementing step performs the deep learning processing on the image samples of the plurality of known image tags, the processor includes:

setting a relation model of the image and the image label, wherein training parameters are set in the relation model;

and taking the image sample as input data of the relation model, taking an image label of the image sample as output data of the relation model, and adjusting the training parameters until the relation model meets preset requirements.

Optionally, in an embodiment of the application, when the processor performs label prediction on the images in the image data set within a preset image label range, the processor may include:

Optionally, in an embodiment of the present application, the images in the image dataset may have at least one identical image tag.

sampling images in the image data set according to a preset rule;

and respectively carrying out label prediction on the sampled images within a preset image label range.

and respectively calculating the confidence degrees of the predicted labels.

Optionally, in an embodiment of the application, when the processor respectively counts the number of the images corresponding to the prediction tags, the implementing step may include:

judging whether the confidence of the predicted label is greater than a preset threshold value or not;

and if so, determining the participation frequency statistics of the prediction label.

Optionally, in an embodiment of the present application, the preset condition may include at least one of the following:

the number of times is greater than a first threshold;

Optionally, in an embodiment of the application, the obtaining, by the processor in the implementation step, an image corresponding to the prediction tag whose number of times meets a preset condition, and setting the tag of the image dataset to which the image belongs as the prediction tag may include:

acquiring an image corresponding to the prediction label with the times meeting the preset conditions;

screening out images which are not less than a fourth threshold value and belong to the same category of image data sets;

and setting the label of the image data set to which the screened image belongs as the prediction label.

In another aspect, the present application provides another embodiment of a tag prediction apparatus, which may include a processor and a memory for storing processor-executable instructions, and when the processor executes the instructions, the processor may:

acquiring a plurality of images belonging to the same category;

The present application also proposes, in another aspect, a computer-readable storage medium having stored thereon computer instructions that, when executed, implement the steps of:

and counting the number of the prediction labels, acquiring the images corresponding to the prediction labels of which the number meets the preset conditions, and setting the labels of the image data sets to which the images belong as the prediction labels.

The computer readable storage medium may include physical means for storing information, typically by digitizing the information for storage on a medium using electrical, magnetic or optical means. The computer-readable storage medium according to this embodiment may include: devices that store information using electrical energy, such as various types of memory, e.g., RAM, ROM, etc.; devices that store information using magnetic energy, such as hard disks, floppy disks, tapes, core memories, bubble memories, and usb disks; devices that store information optically, such as CDs or DVDs. Of course, there are other ways of storing media that can be read, such as quantum memory, graphene memory, and so forth.

Although the present application refers to data learning and processing descriptions such as deep learning methods, label prediction, data statistics, etc. in the embodiments, the present application is not limited to the case where the data presentation and processing described in the embodiments or the standards of the industrial programming language design must be completely met. The embodiments slightly modified from the descriptions of certain page design languages or examples can also realize the same, equivalent or similar implementation effects or the expected implementation effects after modification of the above examples. Of course, even if the data processing method is not adopted, the same application can be still implemented as long as the data learning and processing descriptions according to the above embodiments of the present application are met, and further description is omitted here.

Although the present application provides method steps as described in an embodiment or flowchart, more or fewer steps may be included based on conventional or non-inventive means. The order of steps recited in the embodiments is merely one manner of performing the steps in a multitude of orders and does not represent the only order of execution. When an actual apparatus or client product executes, it may execute sequentially or in parallel (e.g., in the context of parallel processors or multi-threaded processing) according to the embodiments or methods shown in the figures.

Those skilled in the art will also appreciate that, in addition to implementing the controller as pure computer readable program code, the same functionality can be implemented by logically programming method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Such a controller may therefore be considered as a hardware component, and the means included therein for performing the various functions may also be considered as a structure within the hardware component. Or even means for performing the functions may be regarded as being both a software module for performing the method and a structure within a hardware component.

The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, classes, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

From the above description of the embodiments, it is clear to those skilled in the art that the present application can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, or the like, and includes several instructions for enabling a computer device (which may be a personal computer, a mobile terminal, a server, or a network device) to execute the method according to the embodiments or some parts of the embodiments of the present application.

The embodiments in the present specification are described in a progressive manner, and the same or similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. The application is operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable electronic devices, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.

While the present application has been described with examples, those of ordinary skill in the art will appreciate that there are numerous variations and permutations of the present application without departing from the spirit of the application, and it is intended that the appended claims encompass such variations and permutations without departing from the spirit of the application.

Claims

1. A label prediction method, comprising:

respectively counting the occurrence frequency of each prediction label, acquiring an image corresponding to the prediction label of which the frequency meets a preset condition, and setting a label of an image data set to which the image belongs as the prediction label; so as to merge the image data set into an initial image data set corresponding to the prediction tag in an initial image data source according to the prediction tag of the image data set, wherein the initial image data source comprises the initial image data set under a plurality of image tags, and the preset image tag range comprises the image tags of the initial image data sets in the initial image data source.

2. The method of claim 1, wherein before performing label prediction on the images in the image data set within a preset image label range, the method further comprises:

acquiring image samples of a plurality of known image labels;

3. The method of claim 2, wherein the deep learning processing of the image samples of the plurality of known image labels comprises:

4. The method according to claim 2 or 3, wherein the label prediction of the images in the image data set within a preset image label range comprises:

5. The method of claim 1, wherein the images in the image dataset have at least one identical image label.

6. The method according to claim 1, wherein the label prediction of the images in the image data set within a preset image label range comprises:

sampling images in the image data set according to a preset rule;

7. The method according to claim 1, wherein the label prediction of the images in the image data set within a preset image label range comprises:

and respectively calculating the confidence degrees of the predicted labels.

8. The method of claim 7, wherein the separately counting the number of occurrences of each predictive tag comprises:

9. The method of claim 1, wherein the preset condition comprises at least one of:

the number of times is greater than a first threshold;

10. The method according to claim 1, wherein the obtaining of the image corresponding to the prediction label whose number of times satisfies a preset condition, and the setting of the label of the image dataset to which the image belongs as the prediction label comprises:

11. A label prediction method, comprising:

respectively counting the number of the images corresponding to the prediction labels, when the number meets a preset condition, acquiring the images corresponding to the prediction labels of which the number meets the preset condition, and setting the labels of the image data sets to which the images belong as the prediction labels; so as to merge the image data set into an initial image data set corresponding to a prediction tag in an initial image data source according to the prediction tag of the image data set, wherein the initial image data source includes the initial image data set under a plurality of image tags, and the preset image tag range includes the image tags of the initial image data sets in the initial image data source;

wherein images in the image dataset have at least one identical image label; the performing label prediction on the images in the image data set within a preset image label range respectively comprises: sampling images in the image data set according to a preset rule; and respectively carrying out label prediction on the sampled images within a preset image label range.

12. A label prediction method, comprising:

acquiring a plurality of images belonging to the same category;

counting the occurrence frequency of a single prediction label, and taking the prediction label with the occurrence frequency meeting a preset condition as a recommendation label of the plurality of images; so as to merge the plurality of images into an initial image data set corresponding to a recommended label in an initial image data source according to the recommended labels of the plurality of images, wherein the initial image data source includes the initial image data sets under the plurality of image labels, and the preset image label range includes the image labels of the initial image data sets in the initial image data source.

13. A tag prediction apparatus comprising a processor and a memory for storing processor-executable instructions, the instructions when executed by the processor implementing:

14. The apparatus of claim 13, wherein the processor further performs, before performing label prediction on the images in the image data set within a preset image label range, the steps of:

acquiring image samples of a plurality of known image labels;

15. The apparatus of claim 14, wherein the processor, when implementing the step of performing the deep learning process on the image samples of the plurality of known image tags, comprises:

16. The apparatus according to claim 14 or 15, wherein the processor, when performing the step of performing label prediction on the images in the image data set within a preset image label range, comprises:

17. The apparatus of claim 13, wherein the images in the image dataset have at least one identical image tag.

18. The apparatus of claim 13, wherein the processor, when performing the step of performing label prediction on the images in the image data set within a preset image label range, comprises:

sampling images in the image data set according to a preset rule;

19. The apparatus of claim 13, wherein the processor, when performing the step of performing label prediction on the images in the image data set within a preset image label range, comprises:

and respectively calculating the confidence degrees of the predicted labels.

20. The apparatus of claim 19, wherein the processor, when performing the step of counting the number of occurrences of each predictive tag respectively, comprises:

21. The apparatus of claim 13, wherein the preset condition comprises at least one of:

the number of times is greater than a first threshold;

22. The apparatus according to claim 13, wherein the processor, when the implementing step obtains the image corresponding to the prediction tag whose number of times satisfies a preset condition, and sets the tag of the image dataset to which the image belongs as the prediction tag, includes:

23. A tag prediction apparatus comprising a processor and a memory for storing processor-executable instructions, the instructions when executed by the processor implementing:

respectively counting the occurrence frequency of each prediction label, acquiring an image corresponding to the prediction label of which the frequency meets a preset condition, and setting a label of an image data set to which the image belongs as the prediction label; so as to merge the image data set into an initial image data set corresponding to a prediction tag in an initial image data source according to the prediction tag of the image data set, wherein the initial image data source includes the initial image data set under a plurality of image tags, and the preset image tag range includes the image tags of the initial image data sets in the initial image data source;

24. A tag prediction apparatus comprising a processor and a memory for storing processor-executable instructions, the instructions when executed by the processor implementing:

acquiring a plurality of images belonging to the same category;

25. A computer readable storage medium having computer instructions stored thereon which when executed perform the steps of: