WO2020108234A1 - 图像索引生成方法、图像搜索方法、装置、终端及介质 - Google Patents

图像索引生成方法、图像搜索方法、装置、终端及介质 Download PDF

Info

Publication number
WO2020108234A1
WO2020108234A1 PCT/CN2019/115411 CN2019115411W WO2020108234A1 WO 2020108234 A1 WO2020108234 A1 WO 2020108234A1 CN 2019115411 W CN2019115411 W CN 2019115411W WO 2020108234 A1 WO2020108234 A1 WO 2020108234A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
index
sentence
keyword
search
Prior art date
Application number
PCT/CN2019/115411
Other languages
English (en)
French (fr)
Inventor
侯允
刘耀勇
陈岩
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Publication of WO2020108234A1 publication Critical patent/WO2020108234A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/51Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually

Definitions

  • This application relates to the field of search technology, in particular to an image index generation method, image search method, device, terminal and medium.
  • a photo album application is usually installed in the terminal, and the photo album application is generally used to store captured images, images saved from the network, and the like.
  • Embodiments of the present application provide an image index generation method, image search method, device, terminal, and medium.
  • the technical solution is as follows:
  • an image index generation method includes:
  • the description sentence is determined as an index of the first image, and the index is stored in correspondence with the first image.
  • an image search method including:
  • the index corresponding to the second image includes a first target keyword, and the first target keyword matches the first keyword ,
  • the index corresponding to the second image is a description sentence generated according to the recognition result of the second image
  • a search result is displayed, the search result including the second image.
  • an image index generation device comprising:
  • the image acquisition module is used to acquire the first image
  • An image recognition module configured to perform image recognition on the first image to obtain a recognition result corresponding to the first image
  • a sentence generating module configured to generate a description sentence according to the recognition result, and the description sentence is used to describe the first image
  • the index generation module is configured to determine the description sentence as an index of the first image, and store the index corresponding to the first image.
  • an image search device including:
  • Search box display module used to display the search box
  • a keyword receiving module configured to receive the first keyword input in the search box
  • An image search module is used to search a photo album for a second image matching the first keyword, an index corresponding to the second image includes a first target keyword, and the first target keyword and the The first keywords match, and the index corresponding to the second image is a description sentence generated according to the recognition result of the second image;
  • the result display module is used to display search results, and the search results include the second image.
  • an embodiment of the present application provides a terminal, the terminal includes a processor and a memory, and the memory stores a computer program, and the computer program is loaded and executed by the processor to implement the foregoing image index generation method, Or implement the above image search method.
  • an embodiment of the present application provides a computer-readable storage medium in which a computer program is stored, and the computer program is loaded and executed by a processor to implement the above image index generation method, or The above image search method.
  • FIG. 3 is a flowchart of an image search method provided by an embodiment of this application.
  • FIG. 5 is a block diagram of an image index generation device provided by an embodiment of the present application.
  • FIG. 6 is a block diagram of an image search device provided by an embodiment of the present application.
  • FIG. 7 is a block diagram of a terminal provided by an embodiment of the present application.
  • Embodiments of the present application provide an image index generation method, device, terminal, and storage medium.
  • the above description sentence is determined as the index of the image, and then when the user needs to search for the image, he can input the words included in the index, or the meanings of the words included in the index are similar Words, the terminal can accurately find the image according to the words entered by the user, which improves the search efficiency of searching images in the album.
  • the execution subject of each step is a terminal.
  • a photo album application is installed in the terminal, and the photo album application refers to an application for storing images.
  • the image may be an image (including photos and videos) taken by the user, or an image (including photos and videos) saved by the user from other applications.
  • the terminal may be a mobile phone, a tablet computer, a personal computer, a smart wearable device, a camera, a smart playback device, and so on.
  • An embodiment of the present application provides an image index generation method.
  • the method includes:
  • the description sentence is determined as an index of the first image, and the index is stored in correspondence with the first image.
  • the generating a description sentence based on the recognition result includes:
  • the first word vector is processed through a language description model to obtain the description sentence.
  • the method further includes:
  • the associated information including at least one of the following: location information, time information, and scene information;
  • the generating a description sentence based on the recognition result includes:
  • the first word vector and the second word vector are processed through a language description model to obtain the description sentence.
  • the method before determining the description sentence as an index of the first image and storing the index corresponding to the first image, the method further includes:
  • the inquiry information is used to inquire whether to determine the description sentence as the index
  • the method further includes:
  • performing image recognition on the first image to obtain a recognition result corresponding to the first image includes:
  • the image recognition model is a neural network model trained by using multiple sample images, and the object in each sample image of the multiple sample images corresponds to a classification label.
  • the method before generating the description sentence based on the recognition result, the method further includes:
  • the training sample set including a plurality of sample images, the sample images corresponding to the expected description sentences corresponding to the recognition results;
  • the recognition result is processed through a language description model, and the actual description sentence is output;
  • the parameters of the language description model are adjusted, and the step of outputting the actual description sentence from the step of processing through the language description model for each sample image and outputting the actual description sentence;
  • the training is stopped, and the language description model that completes the training is obtained, and the language description model is used to generate the description sentence according to the recognition result.
  • An embodiment of the present application also provides an image search method.
  • the method includes:
  • the index corresponding to the second image includes a first target keyword, and the first target keyword matches the first keyword ,
  • the index corresponding to the second image is a description sentence generated according to the recognition result of the second image
  • a search result is displayed, the search result including the second image.
  • the method further includes:
  • prompt information is displayed, and the prompt information is used to prompt the input of the second keyword;
  • an index corresponding to the third image includes a second target keyword, the second target keyword and the second key Match words
  • the search result includes the third image.
  • FIG. 1 shows a flowchart of an image index generation method provided by an embodiment of the present application.
  • the method may include the following steps:
  • Step 101 Acquire a first image.
  • the first image may be an image collected by a camera on the terminal.
  • a camera is provided on the terminal and a shooting application is installed.
  • the shooting application refers to an application used to capture an image, for example, a camera application, a beauty application, or other applications.
  • the terminal receives the trigger signal of the shooting control acting on the current shooting interface, and acquires the image collected by the camera as the first image.
  • the first image may not be an image collected by a camera on the terminal, but an image saved by the user from other application programs.
  • the first image is an image obtained from the network or a screenshot.
  • the terminal when the terminal receives a save instruction corresponding to the image, the image is acquired from the network as the first image according to the save instruction.
  • the embodiment of the present application does not limit the acquisition method and timing of the first image.
  • Step 102 Perform image recognition on the first image to obtain a recognition result corresponding to the first image.
  • the recognition result corresponding to the first image is used to indicate the object included in the first image.
  • the first image may include one or more objects, such as people, animals, buildings, landscapes, and so on.
  • the terminal determines the category to which each object belongs by the following steps.
  • the category to which each object belongs is used to indicate the category to which the object belongs.
  • the object is a cat or dog or grass or human or other categories:
  • the image recognition model performs image recognition on the first image to obtain recognition results corresponding to at least one object in the first image, respectively.
  • the image recognition model is a neural network model trained using multiple sample images.
  • the image recognition model may be obtained by training the deep learning network using multiple sample images.
  • the objects in each sample image of the multiple sample images Corresponding to the classification label, the classification label is used to characterize the category to which the object belongs.
  • the image recognition model includes: an input layer and at least one convolutional layer (such as a total of 3 convolutional layers including a first convolutional layer, a second convolutional layer, and a third convolutional layer) , At least one fully connected layer (for example, including two fully connected layers including the first fully connected layer and the second fully connected layer) and one output layer.
  • the input data of the input layer is the first image
  • the output result of the output layer is the classification to which at least one object included in the first image belongs, respectively.
  • the image recognition process is as follows: the first image is input to the input layer of the image recognition model, the features of the first image are extracted by the convolutional layer of the image recognition model, and then the above features are combined and abstracted by the fully connected layer of the image recognition model To obtain data suitable for classification in the output layer, and finally the output layer outputs the recognition results corresponding to the at least one object included in the first image, respectively.
  • the specific structures of the convolution layer and the fully connected layer of the image recognition model are not limited.
  • the image recognition model shown in the above embodiment is only exemplary and explanatory, and is not used to limit the present application.
  • the more layers of the convolutional neural network the better the effect but the longer the calculation time.
  • the convolutional neural network with the appropriate number of layers can be designed in conjunction with the requirements for recognition accuracy and efficiency.
  • the sample image refers to an image selected in advance for training the image recognition model.
  • the sample image has a classification label.
  • the classification label of the sample image is usually determined manually, and is used to describe the scene, item, person, etc. corresponding to the sample image.
  • the neural network may be a deep learning network
  • the deep learning network may use alexNet network, VGG-16 network, GoogleNet network, Deep Residual Learning (deep residual learning) network, etc., which is not limited in the embodiments of the present application.
  • the algorithms used in training the deep learning network may be BP (Back-Propagation, back propagation algorithm), faster RCNN (Regions with Convolutional Neural Network, regional convolutional neural network) algorithm, etc., this embodiment of the application does not make limited.
  • the trained deep learning network is obtained, that is, the image recognition model is obtained.
  • Step 103 Generate a description sentence according to the recognition result.
  • the description sentence is used to describe the first image.
  • the description sentence includes the recognition results corresponding to at least one object respectively.
  • the description sentence also includes other words, which can be used to describe at least one of the following: the positional relationship between at least two objects, the action being performed by an object, the state of an object, etc. Wait.
  • the first image is recognized, and the objects in the first image include a dog and a grass, and the dog's posture on the grass is running, and the above recognition result is input into a language description model to obtain the first image
  • the corresponding descriptive sentence is "dog running on the grass".
  • the language description model includes: an input layer and at least one convolutional layer (such as a total of 3 convolutional layers including a first convolutional layer, a second convolutional layer, and a third convolutional layer) , At least one fully connected layer (for example, including two fully connected layers including the first fully connected layer and the second fully connected layer) and one output layer.
  • the input data of the input layer is the first image and the recognition result to which the object in the first image belongs.
  • the output result of the output layer is the description sentence corresponding to the first image.
  • the generation process of the description sentence is as follows: the first image and the recognition results of the objects in the first image are input to the input layer of the language description model, the convolutional layer of the language description model extracts the features of the above input content, and then the language description model The fully connected layer of the group combines and abstracts the above features, and finally the output layer outputs the description sentence corresponding to the first image.
  • the specific structures of the convolutional layer and the fully connected layer of the language description model are not limited.
  • the language description model shown in the above embodiment is only exemplary and explanatory, and is not intended to limit the application.
  • the more layers of the convolutional neural network the better the effect but the longer the calculation time.
  • step 103 may include the following sub-steps:
  • step 103 can be implemented as:
  • Step 103a converting the recognition result into a first word vector
  • Step 103b Process the first word vector through the language description model to obtain a description sentence.
  • the terminal converts the recognition result into a corresponding word vector through a word vector model.
  • the word vector refers to a vector representing words
  • the word vector model refers to a model that converts words into word vectors, and converts the word vector Input the language description model, and output the description sentence from the language description model.
  • the above word vector model may be a word2vec model.
  • step 103 can also be implemented as:
  • the associated information includes at least one of the following: location information, time information, and scene information.
  • Location information is used to indicate the geographic location when the first image was taken, for example, Shanghai, Beijing, Canada, etc.
  • Time information is used to indicate the time when the first image was acquired, for example, spring, summer, autumn, winter, early morning, evening Etc.
  • the scene information is used to indicate the scene corresponding to the first image, for example, parks, beaches, shopping malls, schools, etc.
  • the terminal can convert the related information into the corresponding word vector through the word vector model.
  • the terminal inputs the first word vector and the second word vector into the language description model, so that the final description sentence is more abundant.
  • the following uses the associated information as location information as an example for description.
  • the first word vector and the second word vector are processed through the language description model to obtain a description sentence.
  • the location information is used to indicate the geographic location when the first image is taken.
  • the position information can be obtained by a positioning component in the terminal, for example, a GPS (Global Positioning System) component.
  • the terminal may also obtain the position information of the first image by performing image recognition on the first image.
  • step 103a For the method of converting the position information into a word vector, reference may be made to step 103a, which will not be repeated here.
  • the description sentence corresponding to the first image is generated by combining the geographic location where the first image is taken, so that the first image can be described more completely, and subsequent users can search for the first image through multiple different keywords An image to enhance the convenience of searching.
  • the first image is identified, and it is obtained that the objects in the first image include a dog and a grass, and the posture of the dog on the grass is running, in addition, the geographic location where the first image is taken is XX Park, then The descriptive sentence corresponding to the first image is "dog running on the grass in xx park".
  • Step 104 Determine the description sentence as the index of the first image, and store the index corresponding to the first image.
  • the terminal determines the description sentence as the index of the first image, and stores the index in correspondence with the first image. Subsequently, if the user needs to search for the first image, he only needs to input at least one word included in the description sentence, or a word matching the word in the description sentence, for example, the similarity between the words in the description sentence For words greater than a preset threshold, the terminal may find the first image according to the words input by the user, and display the first image to the user.
  • the embodiment of the present application does not limit the path for storing the description sentence and the first image, which may be preset by the terminal or may be set by the user.
  • the technical solution provided by the embodiments of the present application recognizes the recognition results corresponding to each object included in the image, and generates a description sentence describing the image according to the recognition result, and determines the above description sentence as the image Index, when the user needs to search for the image later, he can input the words included in the index, or the words with similar meanings to the words included in the index, the terminal can accurately find the image according to the words entered by the user, Improve the search efficiency of searching images in the album.
  • the generated index is accurate.
  • FIG. 2 shows a flowchart of an image index generation method provided by another embodiment of the present application.
  • the method may include the following steps:
  • Step 201 Acquire a first image.
  • Step 202 Perform image recognition on the first image to obtain a recognition result corresponding to the first image.
  • Step 203 Generate a description sentence according to the recognition result.
  • step 204 query information is displayed.
  • the inquiry information is used to inquire whether to determine the description sentence as an index.
  • the inquiry message is "the description sentence corresponding to the image is "watching a concert in a bird's nest", are you sure?".
  • the user can preview the description sentence generated by the language description model, and decide whether to determine the description sentence generated above as the index of the first image.
  • Step 205 when receiving the confirmation instruction corresponding to the inquiry information, determine the description sentence as the index of the first image, and store the index corresponding to the first image.
  • a confirmation instruction can be issued to the query information.
  • the confirmation instruction corresponding to the inquiry information is used to instruct confirmation to determine the generated description sentence as the index of the image.
  • a confirmation control is displayed on the peripheral side of the query information, and when the terminal receives a trigger signal acting on the confirmation control, the terminal receives a confirmation instruction corresponding to the query information.
  • Step 206 when the confirmation instruction is not received, an input box is displayed.
  • the input box is used to receive a description sentence corresponding to the first image input by the user.
  • the terminal does not receive the trigger signal acting on the confirmation control within a preset time
  • the terminal does not receive the confirmation instruction.
  • a denial control is also displayed on the peripheral side of the query information.
  • the terminal receives a trigger signal corresponding to the denial control
  • the terminal does not receive the confirmation instruction, and the terminal may display an input box at this time.
  • Step 207 Receive the sentence input in the input box.
  • Step 208 Determine the input sentence as the index of the first image, and store the index corresponding to the first image.
  • the user judges whether to confirm the generated description sentence as the index of the image, and if the user is not satisfied with the description sentence generated by the terminal, the user inputs the image by himself Corresponding description sentences, so that subsequent users can search the image according to the description sentences entered by themselves, which improves the accuracy of the index and further improves the final image indexing efficiency.
  • the user After generating the index of the first image, the user can search the first image in the album according to the index.
  • an embodiment of the present application further provides an image search method , The image search method may include the following steps:
  • step 301 a search box is displayed.
  • the search box is used for the user to input a search keyword, so that the terminal can find an image matching the search keyword.
  • the search box is displayed on the main interface of the album application.
  • the main interface of the album application program displays a search control.
  • the terminal receives a trigger signal corresponding to the search control, and displays a search box according to the trigger signal.
  • the embodiment of the present application does not limit the display manner of the search box.
  • Step 302 Receive the first keyword entered in the search box.
  • the first keyword is input by the user, and it may be "Forbidden City”, “Cat”, “Rose Flower”, etc., which is not limited in this embodiment of the present application.
  • Step 303 Search the album for the second image that matches the first keyword.
  • the number of second images may be one, or multiple.
  • the index corresponding to the second image is used to describe the second image.
  • the index corresponding to the second image is a description sentence generated according to the recognition result of the second image.
  • the index corresponding to the second image includes the first target keyword.
  • the first target keyword may be a recognition result corresponding to the object included in the second image, or may be other words in the description sentence other than the recognition result, which is not limited in this embodiment of the present application. In this way, users can search the same image with different keywords, reducing the difficulty of searching for images.
  • the first target keyword matches the first keyword, for example, the similarity between the first target keyword and the first keyword meets a preset condition.
  • the preset condition may be that the similarity between the first target keyword and the first keyword is greater than a preset threshold, and the preset threshold may be set according to actual requirements, which is not limited in this embodiment of the present application.
  • the terminal first calculates the similarity between the words included in each description sentence stored in the terminal and the first keyword, and then determines the words whose similarity with the first keyword meets the preset condition as The first target keyword, and finally, the image corresponding to the description sentence containing the first target keyword is used as the second image matching the first keyword.
  • the similarity between the first keyword and the words included in the description sentence can be calculated as follows: the terminal expresses the first keyword as the first vector through the word vector model, and represents the words included in the description sentence as the first Two vectors, and then calculate the similarity between the first keyword and the words included in the description sentence by calculating the cosine distance between the first vector and the second vector, the greater the cosine distance, indicating that the first keyword and the description The lower the similarity between the words included in the sentence; conversely, the smaller the cosine distance, indicating that the similarity between the first keyword and the words included in the description sentence is higher.
  • the terminal may determine words whose cosine distance satisfies the preset condition as the first target keyword.
  • Step 304 Display the search results.
  • the terminal displays the search result on the search result page, and the search result includes the above-mentioned second image.
  • the terminal may sort the second images according to the similarity between the first target keyword and the first keyword.
  • the greater the similarity between the first target keyword and the first keyword the more the second image corresponding to the description sentence containing the first target keyword is arranged in the search result page;
  • the smaller the similarity between the first target keyword and the first keyword the lower the order of the second image corresponding to the description sentence containing the first target keyword in the search result page.
  • the technical solution provided by the embodiments of the present application performs image search through the image index generated according to the above embodiment, and the user only needs to input the words included in the index or the For words with similar meanings, the terminal can accurately search for the image according to the words entered by the user, which improves the search efficiency of searching images in the album.
  • the terminal searches for more second images based on the first keyword
  • the user needs to filter out the images he desires to search among more second images at this time, and the search efficiency is still Relatively low.
  • FIG. 4 shows a flowchart of an image search method provided by another embodiment of the present application.
  • the image search method can be used to solve the problem of low search efficiency when there are many second images searched according to the first keyword.
  • the method includes the following steps:
  • step 401 a search box is displayed.
  • Step 402 Receive the first keyword entered in the search box.
  • Step 403 Search the album for the second image that matches the first keyword.
  • step 404 when the number of second images is greater than the preset number, a prompt message is displayed.
  • the preset number can be set according to actual needs, which is not limited in the embodiments of the present application.
  • the preset number is 10 sheets.
  • the prompt information is used to prompt the input of the second keyword.
  • the second keyword is different from the first keyword.
  • the terminal when finding the second image matching the first keyword, the terminal first detects whether the number of the second image is greater than the preset number. If the number of the second image is less than or equal to the preset number, the second image is directly displayed. If the number of second images is greater than the preset number, the user is prompted to enter more keywords, so that the terminal continues to filter out the first keyword and the second key in the second image matching the first keyword The third image matches the words.
  • Step 405 Obtain the second keyword.
  • the second keyword is also input by the user, which is different from the first keyword.
  • the above prompt information includes an input box for the user to input the second keyword, and the user can input the second keyword in the input box, so that the terminal obtains the second keyword.
  • Step 406 Search for a third image matching the second keyword in the second image.
  • the index corresponding to the third image includes the second target keyword.
  • the second target keyword matches the second keyword.
  • the similarity between the second target keyword and the second keyword meets the second preset condition.
  • the second preset condition may be that the similarity between the second target keyword and the second keyword is greater than a preset threshold, and the preset threshold may be set according to actual requirements, which is not limited in this embodiment of the present application.
  • the terminal first calculates the similarity between the words included in each description sentence stored by the terminal and the first keyword, and between the words included in each description sentence stored by the terminal and the second keyword The similarity of; then the words whose similarity with the first keyword meets the first preset condition are determined as the first target keyword, and the similarity with the second keyword meets the second preset condition The word is determined as the second target keyword; finally, the image corresponding to the description sentence containing the first target keyword and the second target keyword is used as the third image that matches both the first keyword and the second keyword.
  • step 303 for the calculation method of the similarity between the second keyword and the words included in the description sentence, reference may be made to step 303, and details are not described here.
  • the terminal calculates the similarity between the words included in the second image and the second keyword, and determines that the similarity between the second keyword and the second keyword meets the second preset condition as the female target keyword, The image including the second target keyword in the second image is determined as the third image.
  • Step 407 display the search results.
  • the search result includes the above-mentioned third image.
  • the technical solution provided by the embodiments of the present application can prompt the user to input more keywords when there are too many search results, so that the terminal can perform image search based on the keywords entered twice, thereby improving the image search performance. Accuracy.
  • the language description model is pre-trained, and is a model for encoding at least two words into a complete sentence.
  • the following describes the training process of the language description model.
  • Step 501 Obtain a training sample set.
  • the training sample set includes multiple sample images, and the sample images correspond to the expected description sentences corresponding to the recognition results.
  • the recognition result corresponding to the sample image can be marked manually or obtained through the image recognition model. It is expected that the description sentence may be manually marked.
  • Step 502 For the sample image, process the recognition result through the language description model, and output the actual description sentence.
  • the language description model may be a deep learning network, such as alexNet network, VGG-16 network, GoogleNet network, Deep Residual Learning (deep residual learning) network.
  • the parameters of the language description model are initialized.
  • the parameters of the language description model may be set randomly, or may be set by relevant technical personnel based on experience.
  • each sample image is input into a language description model, and the language description model outputs an actual description sentence.
  • Step 503 Calculate the error between the actual description sentence and the expected description sentence.
  • the terminal determines the distance between the actual description sentence and the expected description sentence as an error.
  • the terminal After calculating the error between the actual description sentence and the expected description sentence, the terminal detects whether the error is greater than a preset threshold. If the error is greater than the preset threshold, the parameters of the language description model are adjusted, and the steps of outputting the actual description sentence are processed from the language description model for each sample image, that is, steps 502 and 503 are repeated. When the error is less than or equal to the preset threshold, the training is stopped, and the language description model that has completed the training is obtained. .
  • FIG. 5 shows a block diagram of an image index generation device provided by an embodiment of the present application.
  • the device has the function of implementing the above method, and the function can be realized by hardware, or can be realized by hardware executing corresponding software.
  • the device may be a terminal or may be provided on the terminal.
  • the device includes:
  • the image acquisition module 601 is used to acquire the first image.
  • the image recognition module 602 is configured to perform image recognition on the first image to obtain a recognition result corresponding to the first image.
  • the sentence generating module 603 is configured to generate a description sentence according to the recognition result, and the description sentence is used to describe the first image.
  • the index generation module 604 is configured to determine the description sentence as an index of the first image, and store the index corresponding to the first image.
  • the technical solution provided by the embodiments of the present application recognizes the recognition results corresponding to each object included in the image, and generates a description sentence describing the image according to the recognition result, and determines the above description sentence as the image Index, when the user needs to search for the image later, he can input the words included in the index, or the words with similar meanings to the words included in the index, the terminal can accurately find the image according to the words entered by the user, Improve the search efficiency of searching images in the album.
  • the sentence generation module 603 is used to:
  • the first word vector is processed through a language description model to obtain the description sentence.
  • the device further includes: an information acquisition module (not shown in the figure).
  • the information acquisition module is used to acquire the associated information of the first image, the associated information includes at least one of the following: location information, time information, scene information;
  • the sentence generation module 603 is used to:
  • the first word vector and the second word vector are processed through a language description model to obtain the description sentence.
  • the device further includes: an information display module (not shown in the figure).
  • the information display module is used to display query information, and the query information is used to query whether the description sentence is determined as the index;
  • the index generation module 640 is further configured to, when receiving the confirmation instruction corresponding to the inquiry information, execute the determination of the description sentence as an index of the first image, and compare the index with the The first image corresponds to the stored step.
  • the device further includes an input box display module and a sentence receiving module (not shown in the figure).
  • the input box display module is used to display the input box when the confirmation instruction is not received
  • a sentence receiving module configured to receive a sentence input in the input box
  • the index generation module 640 is further configured to determine the input sentence as an index of the first image, and store the index corresponding to the first image.
  • the image recognition module is configured to:
  • the image recognition model is a neural network model trained by using multiple sample images, and the object in each sample image of the multiple sample images corresponds to a classification label.
  • the device further includes: a sample set acquisition module, a sentence output module, an error calculation module, and a model training module (not shown in the figure).
  • a sample set acquisition module for acquiring a training sample set, the training sample set including a plurality of sample images, the sample images corresponding to the expected description sentences corresponding to the recognition results;
  • the sentence output module is used to process the recognition result through the language description model for the sample image and output the actual description sentence;
  • An error calculation module used to calculate the error between the actual description sentence and the expected description sentence
  • the model training module is used to adjust the parameters of the language description model when the error is greater than a preset threshold, and process from each of the sample images through the language description model to output the actual description sentence Steps begin to execute; until the error is less than or equal to the preset threshold, the training is stopped, and the language description model that has completed the training is obtained, and the language description model is used to generate the description sentence according to the recognition result.
  • FIG. 6 shows a block diagram of an image search apparatus provided by an embodiment of the present application.
  • the device has the function of implementing the above method, and the function can be realized by hardware, or can be realized by hardware executing corresponding software.
  • the device may be a terminal or may be provided on the terminal.
  • the device includes:
  • the search box display module 710 is used to display the search box.
  • the keyword receiving module 720 is configured to receive the first keyword input in the search box.
  • the image search module 730 is configured to search a second image matching the first keyword in an album, and the index corresponding to the second image includes a first target keyword, and the first target keyword is The first keyword matches, and the index corresponding to the second image is a description sentence generated according to the recognition result of the second image.
  • the result display module 740 is configured to display search results, and the search results include the second image.
  • the technical solution provided by the embodiments of the present application can prompt the user to input more keywords when there are too many search results, so that the terminal can perform image search based on the keywords entered twice, thereby improving the image search performance. Accuracy.
  • the device further includes: an information display module and a keyword acquisition module (not shown in the figure).
  • the information display module is configured to display prompt information when the number of the second images is greater than a preset number, and the prompt information is used to prompt the input of the second keyword.
  • the keyword acquisition module is used to acquire the second keyword.
  • the image search module is further configured to search for a third image matching the second keyword in the second image, and an index corresponding to the third image includes a second target keyword, the second The target keyword matches the second keyword;
  • the search result includes the third image.
  • the device provided in the above embodiment realizes its function, it is only exemplified by the division of the above functional modules.
  • the above functions can be allocated by different functional modules according to needs, that is, the equipment
  • the internal structure of is divided into different functional modules to complete all or part of the functions described above.
  • the device and method embodiments provided in the above embodiments belong to the same concept. For the specific implementation process, see the method embodiments, and details are not described here.
  • FIG. 7 shows a structural block diagram of a terminal provided by an exemplary embodiment of the present application.
  • the terminal in this application may include one or more of the following components: a processor 610 and a memory 620.
  • the processor 610 may include one or more processing cores.
  • the processor 610 connects various parts of the entire terminal by using various interfaces and lines, and executes the terminal by executing or executing instructions, programs, code sets or instruction sets stored in the memory 620, and calling data stored in the memory 620 Various functions and processing data.
  • the processor 610 may adopt at least one of digital signal processing (Digital Signal Processing, DSP), field programmable gate array (Field-Programmable Gate Array, FPGA), programmable logic array (Programmable Logic Array, PLA) Various hardware forms.
  • the processor 610 may integrate one or a combination of a central processing unit (Central Processing Unit, CPU) and a modem. Among them, CPU mainly deals with operating system and application program, etc.; modem is used to deal with wireless communication. It can be understood that the above-mentioned modem may not be integrated into the processor 610, and may be implemented by a chip alone.
  • CPU Central Processing Unit
  • the processor 610 executes the program instructions in the memory 620, the image index generation method or the image search method provided by the foregoing method embodiments are implemented.
  • the memory 620 may include random access memory (Random Access Memory, RAM) or read-only memory (Read-Only Memory, ROM).
  • the memory 620 includes a non-transitory computer-readable storage medium.
  • the memory 620 may be used to store instructions, programs, codes, code sets, or instruction sets.
  • the memory 620 may include a storage program area and a storage data area, where the storage program area may store instructions for implementing an operating system, instructions for at least one function, instructions for implementing various method embodiments described above, etc.; storage data area It can store data created according to the use of the terminal.
  • the structure of the above terminal is only schematic. In actual implementation, the terminal may include more or fewer components, such as a display screen, etc., which is not limited in this embodiment.
  • FIG. 6 does not constitute a limitation on the terminal 600, and may include more or fewer components than illustrated, or combine certain components, or adopt different component arrangements.
  • An exemplary embodiment of the present application further provides a computer-readable storage medium on which a computer program is stored, which when loaded and executed by a processor implements the image index generation method or image search method provided by the above method embodiments .
  • An exemplary embodiment of the present application also provides a computer program product containing instructions, which when executed on a computer, causes the computer to execute the image index generation method or the image search method described in the above embodiments.
  • the program may be stored in a computer-readable storage medium.
  • the mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Library & Information Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种图像索引生成方法、图像搜索方法、装置、终端及介质。该方法包括:获取第一图像(101);对第一图像进行图像识别,得到第一图像对应的识别结果(102);根据识别结果生成描述语句(103);将描述语句确定为第一图像的索引,并将索引与第一图像对应存储(104)。该方法通过识别出图像中所包括的各个对象分别对应的识别结果,并根据识别结果生成描述图像的描述语句,将上述描述语句确定为该图像的索引,后续当用户需要搜索该图像时,可以输入该索引中所包括的词语,或者与该索引中所包括的词语的含义相近的词语,终端可以根据用户输入的词语准确地查找该图像,提高了在相册中搜索图像的搜索效率。

Description

图像索引生成方法、图像搜索方法、装置、终端及介质
本申请要求于2018年11月30日提交的申请号为201811457455.0、发明名称为“图像索引生成方法、装置、终端及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及搜索技术领域,特别涉及一种图像索引生成方法、图像搜索方法、装置、终端及介质。
背景技术
目前,终端中通常安装有相册应用程序,该相册应用程序通常用于存储拍摄得到的图像、从网络上保存的图像等。
当相册中保存的图像较多时,用户若需要从上述保存的图像中查找到自己所需的图像,则需要查找终端中的各个相册目录,从相应的相册目录中找到自己所需的图像。
发明内容
本申请实施例提供了一种图像索引生成方法、图像搜索方法、装置、终端及介质。所述技术方案如下:
一个方面,提供了一种图像索引生成方法,所述方法包括:
获取第一图像;
对所述第一图像进行图像识别,得到所述第一图像对应的识别结果;
根据所述识别结果生成描述语句,所述描述语句用于描述所述第一图像;
将所述描述语句确定为所述第一图像的索引,并将所述索引与所述第一图像对应存储。
另一方面,提供了一种图像搜索方法,所述方法包括:
显示搜索框;
接收在所述搜索框输入的第一关键字;
在相册中搜索与所述第一关键字相匹配的第二图像,所述第二图像对应的索引中包括第一目标关键字,所述第一目标关键字与所述第一关键字相匹配,所述第二图像对应的索引是根据所述第二图像的识别结果生成的描述语句;
显示搜索结果,所述搜索结果包括所述第二图像。
另一方面,提供了一种图像索引生成装置,所述装置包括:
图像获取模块,用于获取第一图像;
图像识别模块,用于对所述第一图像进行图像识别,得到所述第一图像对应的识别结果;
语句生成模块,用于根据所述识别结果生成描述语句,所述描述语句用于描述所述第一图像;
索引生成模块,用于将所述描述语句确定为所述第一图像的索引,并将所述索引与所述第一图像对应存储。
又一方面,提供了一种图像搜索装置,所述装置包括:
搜索框显示模块,用于显示搜索框;
关键字接收模块,用于接收在所述搜索框输入的第一关键字;
图像搜索模块,用于在相册中搜索与所述第一关键字相匹配的第二图像,所述第二图像对应的索引中包括第一目标关键字,所述第一目标关键字与所述第一关键字相匹配,所述第二图像对应的索引是根据所述第二图像的识别结果生成的描述语句;
结果显示模块,用于显示搜索结果,所述搜索结果包括所述第二图像。
又一方面,本申请实施例提供一种终端,所述终端包括处理器和存储器,所述存储器存储有计算机程序,所述计算机程序由所述处理器加载并执行以实现上述图像索引生成方法,或实现上述图像搜索方法。
又一方面,本申请实施例提供一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,所述计算机程序由处理器加载并执行以实现上述图像索引生成方法,或实现上述图像搜索方法。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请一个实施例提供的图像索引生成方法的流程图;
图2为本申请另一个实施例提供的图像索引生成方法的流程图;
图3为本申请一个实施例提供的图像搜索方法的流程图;
图4为本申请另一个实施例提供的图像搜索方法的流程图;
图5为本申请一个实施例提供的图像索引生成装置的框图;
图6为本申请一个实施例提供的图像搜索装置的框图;
图7为本申请一个实施例提供的终端的框图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。
本申请实施例提供了一种图像索引生成方法、装置、终端及存储介质,通过识别出图像中所包括的各个对象分别对应的识别结果,并通过语言描述模型来生成包括上述识别结果,且用于描述图像的描述语句,将上述描述语句确定为该图像的索引,后续当用户需要搜索该图像时,可以输入该索引中所包括的词语,或者与该索引中所包括的词语的含义相近的词语,终端可以根据用户输入的词语准确地查找该图像,提高了在相册中搜索图像的搜索效率。
本申请实施例提供的技术方案,各步骤的执行主体为终端。可选地,终端中安装有相册应用程序,相册应用程序是指用于存储图像的应用程序。该图像可以是用户拍摄的图像(包括照片和视频),也可以是用户从其他应用程序中保存的图像(包括照片和视频)。终端可以是手机、平板电脑、个人计算机、智能可穿戴设备、相机、智能播放设备等等。
本申请实施例提供了一种图像索引生成方法,所述方法包括:
获取第一图像;
对所述第一图像进行图像识别,得到所述第一图像对应的识别结果;
根据所述识别结果生成描述语句,所述描述语句用于描述所述第一图像;
将所述描述语句确定为所述第一图像的索引,并将所述索引与所述第一图像对应存储。
可选地,所述根据所述识别结果生成描述语句,包括:
将所述识别结果转换为第一词向量;
通过语言描述模型对所述第一词向量进行处理,得到所述描述语句。
可选地,所述获取第一图像之后,还包括:
获取所述第一图像的关联信息,所述关联信息包括以下至少一项:位置信息、时间信息、场景信息;
所述根据所述识别结果生成描述语句,包括:
将所述识别结果转换为第一词向量;
将所述关联信息转换为第二词向量;
通过语言描述模型对所述第一词向量和所述第二词向量进行处理,得到所述描述语句。
可选地,所述将所述描述语句确定为所述第一图像的索引,并将所述索引与所述第一图像对应存储之前,还包括:
显示询问信息,所述询问信息用于询问是否将所述描述语句确定为所述索引;
在接收到对应于所述询问信息的确认指示时,执行所述将所述描述语句确定为所述第一图像的索引,并将所述索引与所述第一图像对应存储的步骤。
可选地,所述显示询问信息之后,还包括:
在未接收到所述确认指示时,显示输入框;
接收在所述输入框输入的语句;
将所述输入的语句确定为所述第一图像的索引,并将所述索引与所述第一图像对应存储。
可选地,所述对所述第一图像进行图像识别,得到所述第一图像对应的识别结果,包括:
通过图像识别模型对所述第一图像进行图像识别,得到所述第一图像中的至少一个对象分别对应的识别结果;
其中,所述图像识别模型是采用多个样本图像训练得到的神经网络模型,所述多个样本图像中的每个样本图像中的对象对应有分类标签。
可选地,所述根据所述识别结果生成描述语句之前,还包括:
获取训练样本集,所述训练样本集包括多个样本图像,所述样本图像对应有所述识别结果对应的期望描述语句;
对于所述样本图像,将所述识别结果通过语言描述模型进行处理,输出实际描述语句;
计算所述实际描述语句与所述期望描述语句之间的误差;
当所述误差大于预设阈值时,则调整所述语言描述模型的参数,并从所述对于所述每个样本图像,通过语言描述模型进行处理,输出实际描述语句的步骤开始执行;直至所述误差小于或等于所述预设阈值时,停止训练,得到完成训练的所述语言描述模型,所述语言描述模型用于根据所述识别结果生成所述描述语句。
本申请实施例还提供了一种图像搜索方法,所述方法包括:
显示搜索框;
接收在所述搜索框输入的第一关键字;
在相册中搜索与所述第一关键字相匹配的第二图像,所述第二图像对应的索引中包括第一目标关键字,所述第一目标关键字与所述第一关键字相匹配,所述第二图像对应的索引是根据所述第二图像的识别结果生成的描述语句;
显示搜索结果,所述搜索结果包括所述第二图像。
可选地,所述显示搜索结果之前,还包括:
当所述第二图像的数量大于预设数量时,显示提示信息,所述提示信息用于提示输入第二关键字;
获取所述第二关键字;
在所述第二图像中搜索与所述第二关键字匹配的第三图像,所述第三图像对应的索引中包括第二目标关键字,所述第二目标关键字与所述第二关键字相匹配;
其中,所述搜索结果包括所述第三图像。
请参考图1,其示出了本申请一个实施例提供的图像索引生成方法的流程图。该方法可以包括如下步骤:
步骤101,获取第一图像。
在一种可能的实现方式中,第一图像可以是终端上的摄像头采集到的图像。可选地,终端上设置有摄像头且安装有拍摄类应用程序,拍摄类应用程序是指用于拍摄图像的应用程序,例如,相机应用程序、美颜应用程序或其他应用程序等。当该拍摄类应用程序运行时,终端接收作用在当前拍摄界面上的拍摄控件的触发信号时,获取摄像头采集到的图像作为第一图像。
在另一种可能的实现方式中,第一图像可以不是终端上的摄像头采集到的图像,是用户从其他应用程序中保存的图像。可选地,第一图像是从网络中获取到的图像或者是截图。可选地,当终端的显示界面中显示有一图像,当终端接收到对应于该图像的保存指令时,根据该保存指令从网络中获取该图像作为第一图像。
此外,本申请实施例对第一图像的获取方式以及时机均不作限定。
步骤102,对第一图像进行图像识别,得到第一图像对应的识别结果。
第一图像对应的识别结果用于指示第一图像包括的对象,例如,第一图像中可以包括一个或多个对象,例如人物、动物、建筑、风景等等。在本申请实施例中,终端通过如下步骤确定各个对象分别所属的分类,各个对象所属的分类用于指示对象具体所属的类别,例如,该对象是猫或狗或草或人或其他类别:通过图像识别模型对第一图像进行图像识别,得到第一图像中的至少一个对象分别对应的识别结果。
图像识别模型是采用多个样本图像训练得到的神经网络模型,例如,图像识别模型可以是采用多个样本图像对深度学习网络进行训练得到的,多个样本图像中的每个样本图像中的对象对应有分类标签,分类标签用于表征对象所属的类别。在本申请的一些实施例中,图像 识别模型包括:一个输入层、至少一个卷积层(比如包括第一卷积层、第二卷积层和第三卷积层共3个卷积层)、至少一个全连接层(比如包括第一全连接层和第二全连接层共2个全连接层)和一个输出层。输入层的输入数据即为第一图像,输出层的输出结果是该第一图像所包括的至少一个对象分别所属的分类。图像识别过程如下:将第一图像输入至图像识别模型的输入层,由图像识别模型的卷积层提取该第一图像的特征,而后由图像识别模型的全连接层对上述特征进行组合和抽象,得到适用于输出层进行分类的数据,最后由输出层输出该第一图像所包括的至少一个对象分别对应的识别结果。
在本申请实施例中,对图像识别模型的卷积层和全连接层的具体结构不作限定,上述实施例所示的图像识别模型仅是示例性和解释性的,并不用于限定本申请。一般来说,卷积神经网络的层数越多,效果越好但计算时间也会越长,在实际应用中,可结合对识别精度和效率的要求,设计适当层数的卷积神经网络。
样本图像是指预先选定的,用于对图像识别模型进行训练的图像。样本图像具有分类标签,样本图像的分类标签通常由人工确定,用于描述样本图像对应的场景、物品、人物等等。
可选地,神经网络可以是深度学习网络,深度学习网络可采用alexNet网络、VGG-16网络、GoogleNet网络、Deep Residual Learning(深度残差学习)网络等等,本申请实施例对此不作限定。另外,训练深度学习网络时所采用的算法可以是BP(Back-Propagation,反向传播算法)、faster RCNN(Regions with Convolutional Neural Network,区域卷积神经网络)算法等,本申请实施例对此不作限定。
下面以训练深度学习网络时所采用的算法为BP算法为例,对图像识别模型的训练过程进行讲解:首先初始化深度学习网络中各个层的参数;其次将样本图像输入深度学习网络,得到样本图像对应的识别结果;然后将识别结果与分类标签进行比对,得到识别结果与分类标签之间的误差;最后基于上述误差调整深度学习网络中各个层的参数,重复上述步骤,直至识别结果与分类标签之间的误差小于预设数值,此时得到训练完成的深度学习网络,也即得到图像识别模型。
步骤103,根据识别结果生成描述语句。
描述语句用于描述第一图像。描述语句中包括至少一个对象分别对应的识别结果。可选地,描述语句中还包括其它词语,该其它词语可以用于形容以下至少一种:至少两个对象之间的位置关系、某一对象正在执行的动作、某一对象所处的状态等等。示例性地,对第一图像进行识别,得到第一图像中的对象包括狗和草地,并且该狗在草地上的姿态为跑动,将上述识别结果输入语言描述模型中,得到该第一图像对应的描述语句为“狗在草地上跑动”。
在本申请的一些实施例中,语言描述模型包括:一个输入层、至少一个卷积层(比如包 括第一卷积层、第二卷积层和第三卷积层共3个卷积层)、至少一个全连接层(比如包括第一全连接层和第二全连接层共2个全连接层)和一个输出层。输入层的输入数据即为第一图像,以及第一图像中的对象所属的识别结果,输出层的输出结果是该第一图像对应的描述语句。描述语句的生成过程如下:将第一图像以及第一图像中的对象的识别结果输入至语言描述模型的输入层,由语言描述模型的卷积层提取上述输入内容的特征,而后由语言描述模型的全连接层对上述特征进行组合和抽象,最后由输出层输出该第一图像对应的描述语句。
在本申请实施例中,对语言描述模型的卷积层和全连接层的具体结构不作限定,上述实施例所示的语言描述模型仅是示例性和解释性的,并不用于限定本申请。一般来说,卷积神经网络的层数越多,效果越好但计算时间也会越长,在实际应用中,可结合对运算精度和效率的要求,设计适当层数的卷积神经网络。
可选地,步骤103可以包括如下子步骤:
在一个示例中,步骤103可以实现为:
步骤103a,将识别结果转换为第一词向量;
步骤103b,通过语言描述模型对第一词向量进行处理,得到描述语句。
在本申请实施例中,终端通过词向量模型将识别结果转换成相应的词向量,词向量是指表征词语的向量,词向量模型是指将词语转换为词向量的模型,并将上述词向量输入语言描述模型,由语言描述模型输出描述语句。上述词向量模型可以是word2vec模型。
在另一个示例中,终端还可以获取第一图像的关联信息。此时,步骤103还可以实现为:
1、将识别结果转换为第一词向量;
2、将关联信息转换为第二词向量;
在本申请实施例中,关联信息包括以下至少一项:位置信息、时间信息、场景信息。位置信息用于指示拍摄第一图像时的地理位置,例如,上海、北京、加拿大等等,时间信息用于指示获取第一图像时的时间,例如,春天、夏天、秋天、冬天、清晨、傍晚等等;场景信息用于指示第一图像对应的场景,例如,公园、海滩、商场、学校等等。终端可以通过词向量模型将关联信息转换成相应的词向量。
3、通过语言描述模型对第一词向量和第二词向量进行处理,得到描述语句。
终端将第一词向量和第二词向量输入语言描述模型,使得最终生成的描述语句更丰富。
示例性地,下面以关联信息为位置信息为例进行介绍说明。
第一,获取第一图像的位置信息。
第二,将位置信息转换成第二词向量;
第三,通过语言描述模型对第一词向量和第二词向量进行处理,得到描述语句。
位置信息用于指示拍摄第一图像时的地理位置。当第一图像为终端通过摄像头采集的图像时,该位置信息可以通过终端中的定位组件,例如,GPS(Global Positioning System,全球定位***)组件来获取。当然,在其他可能的实现方式中,终端还可以通过对第一图像进行图像识别,来获取第一图像的位置信息。将位置信息转换成词向量的方式可以参考步骤103a,此处不作赘述。在本申请实施例中,通过结合拍摄第一图像的地理位置来生成第一图像对应的描述语句,能够更加完整地描述该第一图像,后续用户可以通过多个不同的关键字来搜索该第一图像,提升搜索的便利性。
示例性地,对第一图像进行识别,得到第一图像中的对象包括狗和草地,并且该狗在草地上的姿态为跑动,此外,拍摄该第一图像的地理位置为XX公园,则该第一图像对应的描述语句为“狗在xx公园的草地上跑动”。
步骤104,将描述语句确定为第一图像的索引,并将索引与第一图像对应存储。
终端将描述语句确定为第一图像的索引,并将该索引与第一图像进行对应存储。后续若用户需要查找该第一图像,则只需输入该描述语句包括的至少一个词语,或者与该描述语句中的词语相匹配的词语,例如,与该描述语句中的词语之间的相似度大于预设阈值的词语,则终端可以根据用户输入的词语查找到该第一图像,并将该第一图像展示给用户。
另外,本申请实施例对存储描述语句与第一图像的路径不作限定,其可以由终端预先设定,也可以由用户自定义设定。
综上所述,本申请实施例提供的技术方案,通过识别出图像中所包括的各个对象分别对应的识别结果,并根据识别结果来生成描述图像的描述语句,将上述描述语句确定为该图像的索引,后续当用户需要搜索该图像时,可以输入该索引中所包括的词语,或者与该索引中所包括的词语的含义相近的词语,终端可以根据用户输入的词语准确地查找该图像,提高了在相册中搜索图像的搜索效率。
另外,通过根据图像的识别结果来生成用于描述该图像的描述语句,并将该描述语句确定为该图像的索引,生成的索引准确。
请参考图2,其示出了本申请另一个实施例提供的图像索引生成方法的流程图。该方法可以包括如下步骤:
步骤201,获取第一图像。
步骤202,对第一图像进行图像识别,得到第一图像对应的识别结果。
步骤203,根据识别结果生成描述语句。
步骤204,显示询问信息。
在本申请实施例中,询问信息用于询问是否将该描述语句确定为索引。示例性地,询问信息为“该图像对应的描述语句为“在鸟巢看演唱会”,是否确认?”。
在本申请实施例中,用户可以预览通过语言描述模型所生成的描述语句,并决定是否将上述生成的描述语句确定为第一图像的索引。
步骤205,在接收到对应于询问信息的确认指示时,将描述语句确定为第一图像的索引,并将索引与第一图像对应存储。
若用户确定将该生成的描述语句确定为该图像的索引,则可以对该询问信息下达确认指示。对应于询问信息的确认指示用于指示确认将该生成的描述语句确定为该图像的索引。可选地,询问信息的周侧显示有确认控件,当终端接收到作用在该确认控件的触发信号时,终端接收到对应于询问信息的确认指示。
步骤206,在未接收到确认指示时,显示输入框。
输入框用于接收用户输入的第一图像对应的描述语句。可选地,当终端在预设时间内未接收到作用在该确认控件的触发信号,则终端未接收到确认指示。可选地,询问信息的周侧还显示有否认控件,当终端接收到对应于该否认控件的触发信号时,则终端未接收到确认指示,此时终端可以显示输入框。
步骤207,接收在输入框输入的语句。
在本申请实施例中,当用户对生成的描述语句不满意时,可以自行输入该目标图像的描述语句。
步骤208,将输入的语句确定为第一图像的索引,并将索引与第一图像对应存储。
综上所述,本申请实施例提供的技术方案,通过用户判断是否将生成的描述语句确认为图像的索引,并在用户不满意终端所生成的描述语句的情况下,由用户自行输入该图像对应的描述语句,以使得后续用户能够根据自身所输入的描述语句来对该图像进行搜索,提高了索引的准确性,进而提高最终的图像索引效率。在生成第一图像的索引之后,用户可以根据该索引在相册中搜索第一图像。下面对该搜索过程进行讲解。在基于图1或图2所示实施例提供的一个可选实施例中,在步骤104之后,或者,在步骤208之后,如图3所示,本申请实施例还提供了一种图像搜索方法的流程图,该图像搜索方法可以包括如下步骤:
步骤301,显示搜索框。
搜索框用于供用户输入搜索关键字,以使得终端能够查找与该搜索关键字相匹配的图像。在一种可能的实现方式中,相册应用程序的主界面中显示有该搜索框。在另一种可能的实现方式中,相册应用程序的主界面显示有搜索控件,当用户触发该搜索控件时,终端接收到对应于该搜索控件的触发信号,并根据该触发信号显示搜索框。本申请实施例对搜索框的显示 方式不作限定。
步骤302,接收在搜索框输入的第一关键字。
第一关键字由用户输入,其可以是“故宫”、“猫”“玫瑰花”等等,本申请实施例对此不作限定。
步骤303,在相册中搜索与第一关键字相匹配的第二图像。
第二图像的数量可以是一张,也可以是多张。第二图像对应的索引用于描述该第二图像,第二图像对应的索引是根据第二图像的识别结果生成的描述语句,第二图像对应的索引中包括第一目标关键字。第一目标关键字可以是第二图像中所包括的对象对应的识别结果,也可以是描述语句中除识别结果之外的其它词语,本申请实施例对此不作限定。通过上述方式,用户可以通过不同的关键字来搜索同一图像,降低搜索图像的难度。
示例性地,第一目标关键字与第一关键字相匹配,例如,第一目标关键字与第一关键字之间的相似度符合预设条件。上述预设条件可以是第一目标关键字与第一关键字之间的相似度大于预设阈值,上述预设阈值可以根据实际需求设定,本申请实施例对此不作限定。
可选地,终端先计算出终端所存储的各个描述语句所包括的词语与第一关键字之间的相似度,之后将与第一关键字之间的相似度符合预设条件的词语确定为第一目标关键字,最后将包含该第一目标关键字的描述语句对应的图像作为与第一关键字相匹配的第二图像。
另外,可以通过如下方式计算第一关键字与描述语句所包括的词语之间的相似度:终端通过词向量模型将第一关键字表示为第一向量,将描述语句所包括的词语表示为第二向量,之后通过计算第一向量与第二向量之间的余弦距离,来计算第一关键字与描述语句所包括的词语之间的相似度,余弦距离越大,表明第一关键字与描述语句所包括的词语之间的相似度越低;反之,余弦距离越小,表明第一关键字与描述语句所包括的词语之间的相似度越高。之后,终端可以将余弦距离满足预设条件的词语确定为第一目标关键字。
步骤304,显示搜索结果。
终端在搜索结果页面中显示该搜索结果,搜索结果包括上述第二图像。当第二图像的数量为多张时,终端可以根据第一目标关键字与第一关键字之间的相似度的大小,来对第二图像进行排序。可选地,第一目标关键字与第一关键字之间的相似度越大,则包含该第一目标关键字的描述语句对应的第二图像在搜索结果页面中的排列顺序越靠前;第一目标关键字与第一关键字之间的相似度越小,则包含该第一目标关键字的描述语句对应的第二图像在搜索结果页面中的排列顺序越靠后。
综上所述,本申请实施例提供的技术方案,通过根据上文实施例所生成的图像索引来进行图像搜索,用户只需输入该索引中所包括的词语,或者与该索引中所包括的词语的含义相 近的词语,终端就可以根据用户输入的词语准确地查找该图像,提高了在相册中搜索图像的搜索效率。
当用户输入第一关键字时,终端根据该第一关键字搜索的第二图像的数量较多时,此时用户需要在较多的第二图像中筛选出自己期望搜索到的图像,搜索效率依然较为低下。
请参考图4,其示出了本申请另一个实施例提供的图像搜索方法的流程图。该图像搜索方法可用于解决根据第一关键字搜索到的第二图像较多时,搜索效率低下的问题。该方法包括如下几个步骤:
步骤401,显示搜索框。
步骤402,接收在搜索框输入的第一关键字。
步骤403,在相册中搜索与第一关键字相匹配的第二图像。
步骤404,当第二图像的数量大于预设数量时,显示提示信息。
预设数量可以根据实际需求设定,本申请实施例对此不作限定。示例性地,预设数量为10张。提示信息用于提示输入第二关键字。可选地,第二关键字与第一关键字不同。
在本申请实施例中,终端在查找到与第一关键字相匹配的第二图像时,先检测该第二图像的数量是否大于预设数量。若该第二图像的数量小于或等于预设数量,则直接显示该第二图像。若第二图像的数量大于预设数量,则提示用户输入更多的关键字,以使得终端在上述与第一关键字相匹配的第二图像中继续筛选出与第一关键字、第二关键字均匹配的第三图像。
步骤405,获取第二关键字。
第二关键字也由用户输入,其与第一关键字不同。示例性地,上述提示信息中包括供用户输入第二关键字的输入框,用户可以在该输入框中输入第二关键字,以使得终端获取到该第二关键字。
步骤406,在第二图像中搜索与第二关键字匹配的第三图像。
第三图像对应的索引中包括第二目标关键字。第二目标关键字与第二关键字相匹配,示例性地,第二目标关键字与第二关键字之间的相似度符合第二预设条件。上述第二预设条件可以是第二目标关键字与第二关键字之间的相似度大于预设阈值,上述预设阈值可以根据实际需求设定,本申请实施例对此不作限定。
在一个示例中,终端先计算出终端所存储的各个描述语句所包括的词语与第一关键字之间的相似度,以及终端所存储的各个描述语句所包括的词语与第二关键字之间的相似度;之后将与第一关键字之间的相似度符合第一预设条件的词语确定为第一目标关键字,将与第二关键字之间的相似度符合第二预设条件的词语确定为第二目标关键字;最后将包含该第一目 标关键字和第二目标关键字的描述语句对应的图像作为与第一关键字、第二关键字均匹配的第三图像。另外,第二关键字与描述语句所包括的词语之间的相似度的计算方式可以参考步骤303,此处不作赘述。
在另一个示例中,终端计算第二图像包括的词语与第二关键字的相似度,将与第二关键字之间的相似度符合第二预设条件的词语确定为跌女目标关键字,将第二图像中包括第二目标关键字的图像确定为第三图像。
步骤407,显示搜索结果。
在本申请实施例中,搜索结果包括上述第三图像。
综上所述,本申请实施例提供的技术方案,通过在搜索结果过多时,提示用户输入更多的关键字,以使得终端能够根据两次分别输入的关键字进行图像搜索,提升图像搜索的准确度。
在图1实施例中提到,语言描述模型是预先训练的,用于将至少两个词语编码成完整句子的模型。下面对语言描述模型的训练过程进行讲解。
步骤501,获取训练样本集。
训练样本集包括多个样本图像,样本图像对应有识别结果对应的期望描述语句。样本图像对应的识别结果可以人工标注,也可以通过图像识别模型得到。期望描述语句可以是人工标注的。
步骤502,对于样本图像,将识别结果通过语言描述模型进行处理,输出实际描述语句。
语言描述模型可以是深度学习网络,例如alexNet网络、VGG-16网络、GoogleNet网络、Deep Residual Learning(深度残差学习)网络。初始化语言描述模型的各项参数,可选地,语言描述模型的各项参数可以是随机设定的,也可以是由相关技术人员根据经验设定的。在本申请实施例中,将每个样本图像输入语言描述模型,由该语言描述模型输出实际描述语句。
步骤503,计算实际描述语句与期望描述语句之间的误差。
可选地,终端将实际描述语句与期望描述语句之间的距离确定为误差。
当终端计算出实际描述语句与期望描述语句之间的误差后,检测该误差是否大于预设阈值。若误差大于预设阈值,则调整语言描述模型的参数,并从对于每个样本图像,通过语言描述模型进行处理,输出实际描述语句的步骤开始执行,也即重复步骤502和503。直至误差小于或等于预设阈值时,停止训练,得到完成训练的语言描述模型。。
下述为本申请装置实施例,可以用于执行本申请方法实施例。对于本申请装置实施例中 未披露的细节,请参照本申请方法实施例。
请参考图5,其示出了本申请一个实施例提供的图像索引生成装置的框图。该装置具有实现上述方法的功能,所述功能可以由硬件实现,也可以由硬件执行相应的软件实现。该装置可以是终端,也可以设置在终端上,该装置包括:
图像获取模块601,用于获取第一图像。
图像识别模块602,用于对所述第一图像进行图像识别,得到所述第一图像对应的识别结果。
语句生成模块603,用于根据所述识别结果生成描述语句,所述描述语句用于描述所述第一图像。
索引生成模块604,用于将所述描述语句确定为所述第一图像的索引,并将所述索引与所述第一图像对应存储。
综上所述,本申请实施例提供的技术方案,通过识别出图像中所包括的各个对象分别对应的识别结果,并根据识别结果来生成描述图像的描述语句,将上述描述语句确定为该图像的索引,后续当用户需要搜索该图像时,可以输入该索引中所包括的词语,或者与该索引中所包括的词语的含义相近的词语,终端可以根据用户输入的词语准确地查找该图像,提高了在相册中搜索图像的搜索效率。
在基于图5所示实施例提供的一个可选实施例中,所述语句生成模块603,用于:
将所述识别结果转换为第一词向量;
通过语言描述模型对所述第一词向量进行处理,得到所述描述语句。
可选地,所述装置,还包括:信息获取模块(图中未示出)。
信息获取模块,用于获取所述第一图像的关联信息,所述关联信息包括以下至少一项:位置信息、时间信息、场景信息;
所述语句生成模块603,用于:
将所述识别结果转换为第一词向量;
将所述关联信息转换为第二词向量;
通过语言描述模型对所述第一词向量和所述第二词向量进行处理,得到所述描述语句。
在基于图5所示实施例提供的一个可选实施例中,所述装置还包括:信息显示模块(图中未示出)。
信息显示模块,用于显示询问信息,所述询问信息用于询问是否将所述描述语句确定为所述索引;
所述索引生成模块640,还用于在接收到对应于所述询问信息的确认指示时,执行所述 将所述描述语句确定为所述第一图像的索引,并将所述索引与所述第一图像对应存储的步骤。
可选地,所述装置还包括:输入框显示模块和语句接收模块(图中未示出)。
输入框显示模块,用于在未接收到所述确认指示时,显示输入框;
语句接收模块,用于接收在所述输入框输入的语句;
所述索引生成模块640,还用于将所述输入的语句确定为所述第一图像的索引,并将所述索引与所述第一图像对应存储。
在基于图5所示实施例提供的一个可选实施例中,所述图像识别模块,用于:
通过图像识别模型对所述第一图像进行图像识别,得到所述第一图像中的至少一个对象分别对应的识别结果;
其中,所述图像识别模型是采用多个样本图像训练得到的神经网络模型,所述多个样本图像中的每个样本图像中的对象对应有分类标签。
可选地,所述装置还包括:样本集获取模块、语句输出模块、误差计算模块和模型训练模块(图中未示出)。
样本集获取模块,用于获取训练样本集,所述训练样本集包括多个样本图像,所述样本图像对应有所述识别结果对应的期望描述语句;
语句输出模块,用于对于所述样本图像,将所述识别结果通过语言描述模型进行处理,输出实际描述语句;
误差计算模块,用于计算所述实际描述语句与所述期望描述语句之间的误差;
模型训练模块,用于当所述误差大于预设阈值时,则调整所述语言描述模型的参数,并从所述对于所述每个样本图像,通过语言描述模型进行处理,输出实际描述语句的步骤开始执行;直至所述误差小于或等于所述预设阈值时,停止训练,得到完成训练的所述语言描述模型,所述语言描述模型用于根据所述识别结果生成所述描述语句。
请参考图6,其示出了本申请一个实施例提供的图像搜索装置的框图。该装置具有实现上述方法的功能,所述功能可以由硬件实现,也可以由硬件执行相应的软件实现。该装置可以是终端,也可以设置在终端上,该装置包括:
搜索框显示模块710,用于显示搜索框。
关键字接收模块720,用于接收在所述搜索框输入的第一关键字。
图像搜索模块730,用于在相册中搜索与所述第一关键字相匹配的第二图像,所述第二图像对应的索引中包括第一目标关键字,所述第一目标关键字与所述第一关键字相匹配,所述第二图像对应的索引是根据所述第二图像的识别结果生成的描述语句。
结果显示模块740,用于显示搜索结果,所述搜索结果包括所述第二图像。
综上所述,本申请实施例提供的技术方案,通过在搜索结果过多时,提示用户输入更多的关键字,以使得终端能够根据两次分别输入的关键字进行图像搜索,提升图像搜索的准确度。
可选地,所述装置,还包括:信息显示模块和关键字获取模块(图中未示出)。
信息显示模块,用于当所述第二图像的数量大于预设数量时,显示提示信息,所述提示信息用于提示输入第二关键字。
关键字获取模块,用于获取所述第二关键字。
所述图像搜索模块,还用于在所述第二图像中搜索与所述第二关键字匹配的第三图像,所述第三图像对应的索引中包括第二目标关键字,所述第二目标关键字与所述第二关键字相匹配;
其中,所述搜索结果包括所述第三图像。
需要说明的是,上述实施例提供的装置在实现其功能时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将设备的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的装置与方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
参考图7,其示出了本申请一个示例性实施例提供的终端的结构方框图。本申请中的终端可以包括一个或多个如下部件:处理器610和存储器620。
处理器610可以包括一个或者多个处理核心。处理器610利用各种接口和线路连接整个终端内的各个部分,通过运行或执行存储在存储器620内的指令、程序、代码集或指令集,以及调用存储在存储器620内的数据,执行终端的各种功能和处理数据。可选地,处理器610可以采用数字信号处理(Digital Signal Processing,DSP)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)、可编程逻辑阵列(Programmable Logic Array,PLA)中的至少一种硬件形式来实现。处理器610可集成中央处理器(Central Processing Unit,CPU)和调制解调器等中的一种或几种的组合。其中,CPU主要处理操作***和应用程序等;调制解调器用于处理无线通信。可以理解的是,上述调制解调器也可以不集成到处理器610中,单独通过一块芯片进行实现。
可选地,处理器610执行存储器620中的程序指令时实现上述各个方法实施例提供的图 像索引生成方法或图像搜索方法。
存储器620可以包括随机存储器(Random Access Memory,RAM),也可以包括只读存储器(Read-Only Memory,ROM)。可选地,该存储器620包括非瞬时性计算机可读介质(non-transitory computer-readable storage medium)。存储器620可用于存储指令、程序、代码、代码集或指令集。存储器620可包括存储程序区和存储数据区,其中,存储程序区可存储用于实现操作***的指令、用于至少一个功能的指令、用于实现上述各个方法实施例的指令等;存储数据区可存储根据终端的使用所创建的数据等。
上述终端的结构仅是示意性的,在实际实现时,终端可以包括更多或更少的组件,比如:显示屏等,本实施例对此不作限定。
本领域技术人员可以理解,图6中示出的结构并不构成对终端600的限定,可以包括比图示更多或更少的组件,或者组合某些组件,或者采用不同的组件布置。
本申请一示例性实施例还提供一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器加载并执行时实现上述各个方法实施例提供的图像索引生成方法或图像搜索方法。
本申请一示例性实施例还提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述各个实施例所述的图像索引生成方法或图像搜索方法。
应当理解的是,在本文中提及的“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。
以上所述仅为本申请的可选实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (20)

  1. 一种图像索引生成方法,其特征在于,所述方法包括:
    获取第一图像;
    对所述第一图像进行图像识别,得到所述第一图像对应的识别结果;
    根据所述识别结果生成描述语句,所述描述语句用于描述所述第一图像;
    将所述描述语句确定为所述第一图像的索引,并将所述索引与所述第一图像对应存储。
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述识别结果生成描述语句,包括:
    将所述识别结果转换为第一词向量;
    通过语言描述模型对所述第一词向量进行处理,得到所述描述语句。
  3. 根据权利要求1所述的方法,其特征在于,所述获取第一图像之后,还包括:
    获取所述第一图像的关联信息,所述关联信息包括以下至少一项:位置信息、时间信息、场景信息;
    所述根据所述识别结果生成描述语句,包括:
    将所述识别结果转换为第一词向量;
    将所述关联信息转换为第二词向量;
    通过语言描述模型对所述第一词向量和所述第二词向量进行处理,得到所述描述语句。
  4. 根据权利要求1所述的方法,其特征在于,所述将所述描述语句确定为所述第一图像的索引,并将所述索引与所述第一图像对应存储之前,还包括:
    显示询问信息,所述询问信息用于询问是否将所述描述语句确定为所述索引;
    在接收到对应于所述询问信息的确认指示时,执行所述将所述描述语句确定为所述第一图像的索引,并将所述索引与所述第一图像对应存储的步骤。
  5. 根据权利要求4所述的方法,其特征在于,所述显示询问信息之后,还包括:
    在未接收到所述确认指示时,显示输入框;
    接收在所述输入框输入的语句;
    将所述输入的语句确定为所述第一图像的索引,并将所述索引与所述第一图像对应存储。
  6. 根据权利要求1至5任一项所述的方法,其特征在于,所述对所述第一图像进行图像识别,得到所述第一图像对应的识别结果,包括:
    通过图像识别模型对所述第一图像进行图像识别,得到所述第一图像中的至少一个对象分别对应的识别结果;
    其中,所述图像识别模型是采用多个样本图像训练得到的神经网络模型,所述多个样本图像中的每个样本图像中的对象对应有分类标签。
  7. 根据权利要求1至5任一项所述的方法,其特征在于,所述根据所述识别结果生成描述语句之前,还包括:
    获取训练样本集,所述训练样本集包括多个样本图像,所述样本图像对应有所述识别结果对应的期望描述语句;
    对于所述样本图像,将所述识别结果通过语言描述模型进行处理,输出实际描述语句;
    计算所述实际描述语句与所述期望描述语句之间的误差;
    当所述误差大于预设阈值时,则调整所述语言描述模型的参数,并从所述对于所述每个样本图像,通过语言描述模型进行处理,输出实际描述语句的步骤开始执行;直至所述误差小于或等于所述预设阈值时,停止训练,得到完成训练的所述语言描述模型,所述语言描述模型用于根据所述识别结果生成所述描述语句。
  8. 一种图像搜索方法,其特征在于,所述方法包括:
    显示搜索框;
    接收在所述搜索框输入的第一关键字;
    在相册中搜索与所述第一关键字相匹配的第二图像,所述第二图像对应的索引中包括第一目标关键字,所述第一目标关键字与所述第一关键字相匹配,所述第二图像对应的索引是根据所述第二图像的识别结果生成的描述语句;
    显示搜索结果,所述搜索结果包括所述第二图像。
  9. 根据权利要求8所述的方法,其特征在于,所述显示搜索结果之前,还包括:
    当所述第二图像的数量大于预设数量时,显示提示信息,所述提示信息用于提示输入第二关键字;
    获取所述第二关键字;
    在所述第二图像中搜索与所述第二关键字匹配的第三图像,所述第三图像对应的索引中包括第二目标关键字,所述第二目标关键字与所述第二关键字相匹配;
    其中,所述搜索结果包括所述第三图像。
  10. 一种图像索引生成装置,其特征在于,所述装置包括:
    图像获取模块,用于获取第一图像;
    图像识别模块,用于对所述第一图像进行图像识别,得到所述第一图像对应的识别结果;
    语句生成模块,用于根据所述识别结果生成描述语句,所述描述语句用于描述所述第一图像;
    索引生成模块,用于将所述描述语句确定为所述第一图像的索引,并将所述索引与所述第一图像对应存储。
  11. 根据权利要求10所述的装置,其特征在于,所述语句生成模块,用于:
    将所述识别结果转换为第一词向量;
    通过语言描述模型对所述第一词向量进行处理,得到所述描述语句。
  12. 根据权利要求10所述的装置,其特征在于,所述装置,还包括:
    信息获取模块,用于获取所述第一图像的关联信息,所述关联信息包括以下至少一项:位置信息、时间信息、场景信息;
    所述语句生成模块,用于:
    将所述识别结果转换为第一词向量;
    将所述关联信息转换为第二词向量;
    通过语言描述模型对所述第一词向量和所述第二词向量进行处理,得到所述描述语句。
  13. 根据权利要求10所述的装置,其特征在于,所述装置,还包括:
    信息显示模块,用于显示询问信息,所述询问信息用于询问是否将所述描述语句确定为所述索引;
    所述索引生成模块,还用于在接收到对应于所述询问信息的确认指示时,执行所述将所述描述语句确定为所述第一图像的索引,并将所述索引与所述第一图像对应存储的步骤。
  14. 根据权利要求13所述的方法,其特征在于,所述装置,还包括:
    输入框显示模块,用于在未接收到所述确认指示时,显示输入框;
    语句接收模块,用于接收在所述输入框输入的语句;
    所述索引生成模块,还用于将所述输入的语句确定为所述第一图像的索引,并将所述索引与所述第一图像对应存储。
  15. 根据权利要求10至14任一项所述的装置,其特征在于,所述图像识别模块,用于:
    通过图像识别模型对所述第一图像进行图像识别,得到所述第一图像中的至少一个对象分别对应的识别结果;
    其中,所述图像识别模型是采用多个样本图像训练得到的神经网络模型,所述多个样本图像中的每个样本图像中的对象对应有分类标签。
  16. 根据权利要求10至14任一项所述的方法,其特征在于,所述装置,还包括:
    样本集获取模块,用于获取训练样本集,所述训练样本集包括多个样本图像,所述样本图像对应有所述识别结果对应的期望描述语句;
    语句输出模块,用于对于所述样本图像,将所述识别结果通过语言描述模型进行处理,输出实际描述语句;
    误差计算模块,用于计算所述实际描述语句与所述期望描述语句之间的误差;
    模型训练模块,用于当所述误差大于预设阈值时,则调整所述语言描述模型的参数,并从所述对于所述每个样本图像,通过语言描述模型进行处理,输出实际描述语句的步骤开始执行;直至所述误差小于或等于所述预设阈值时,停止训练,得到完成训练的所述语言描述模型,所述语言描述模型用于根据所述识别结果生成所述描述语句。
  17. 一种图像搜索装置,其特征在于,所述装置包括:
    搜索框显示模块,用于显示搜索框;
    关键字接收模块,用于接收在所述搜索框输入的第一关键字;
    图像搜索模块,用于在相册中搜索与所述第一关键字相匹配的第二图像,所述第二图像对应的索引中包括第一目标关键字,所述第一目标关键字与所述第一关键字相匹配,所述第二图像对应的索引是根据所述第二图像的识别结果生成的描述语句;
    结果显示模块,用于显示搜索结果,所述搜索结果包括所述第二图像。
  18. 根据权利要求17所述的方法,其特征在于,所述装置,还包括:
    信息显示模块,用于当所述第二图像的数量大于预设数量时,显示提示信息,所述提示信息用于提示输入第二关键字;
    关键字获取模块,用于获取所述第二关键字;
    所述图像搜索模块,还用于在所述第二图像中搜索与所述第二关键字匹配的第三图像,所述第三图像对应的索引中包括第二目标关键字,所述第二目标关键字与所述第二关键字相匹配;
    其中,所述搜索结果包括所述第三图像。
  19. 一种终端,其特征在于,所述终端包括处理器和存储器,所述存储器存储有计算机程序,所述计算机程序由所述处理器加载并执行以实现如权利要求1至7任一项所述的图像索引生成方法,或实现如权利要求8至9任一项所述的图像搜索方法。
  20. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有计算机程序,所述计算机程序由处理器加载并执行以实现如权利要求1至7任一项所述的图像索引生成方法,或实现如权利要求8至9任一项所述的图像搜索方法。
PCT/CN2019/115411 2018-11-30 2019-11-04 图像索引生成方法、图像搜索方法、装置、终端及介质 WO2020108234A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811457455.0 2018-11-30
CN201811457455.0A CN109635135A (zh) 2018-11-30 2018-11-30 图像索引生成方法、装置、终端及存储介质

Publications (1)

Publication Number Publication Date
WO2020108234A1 true WO2020108234A1 (zh) 2020-06-04

Family

ID=66070700

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/115411 WO2020108234A1 (zh) 2018-11-30 2019-11-04 图像索引生成方法、图像搜索方法、装置、终端及介质

Country Status (2)

Country Link
CN (1) CN109635135A (zh)
WO (1) WO2020108234A1 (zh)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109635135A (zh) * 2018-11-30 2019-04-16 Oppo广东移动通信有限公司 图像索引生成方法、装置、终端及存储介质
CN110083729B (zh) * 2019-04-26 2023-10-27 北京金山数字娱乐科技有限公司 一种图像搜索的方法及***
CN110362698A (zh) * 2019-07-08 2019-10-22 北京字节跳动网络技术有限公司 一种图片信息生成方法、装置、移动终端及存储介质
CN112541091A (zh) * 2019-09-23 2021-03-23 杭州海康威视数字技术股份有限公司 图像搜索方法、装置、服务器和存储介质
CN110704654A (zh) * 2019-09-27 2020-01-17 三星电子(中国)研发中心 一种图片搜索方法和装置
CN112925939A (zh) * 2019-12-05 2021-06-08 阿里巴巴集团控股有限公司 图片搜索方法、描述信息生成方法、设备及存储介质
CN111046203A (zh) * 2019-12-10 2020-04-21 Oppo广东移动通信有限公司 图像检索方法、装置、存储介质及电子设备
CN111797765B (zh) * 2020-07-03 2024-04-16 北京达佳互联信息技术有限公司 图像处理方法、装置、服务器及存储介质
CN112711998A (zh) * 2020-12-24 2021-04-27 珠海新天地科技有限公司 3d模型注释***及方法

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103838724A (zh) * 2012-11-20 2014-06-04 百度在线网络技术(北京)有限公司 图像搜索方法及装置
CN106446782A (zh) * 2016-08-29 2017-02-22 北京小米移动软件有限公司 图像识别方法及装置
CN106708940A (zh) * 2016-11-11 2017-05-24 百度在线网络技术(北京)有限公司 用于处理图片的方法和装置
CN107766853A (zh) * 2016-08-16 2018-03-06 阿里巴巴集团控股有限公司 一种图像的文本信息的生成、显示方法及电子设备
WO2018134964A1 (ja) * 2017-01-20 2018-07-26 楽天株式会社 画像検索システム、画像検索方法およびプログラム
CN109635135A (zh) * 2018-11-30 2019-04-16 Oppo广东移动通信有限公司 图像索引生成方法、装置、终端及存储介质

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103136228A (zh) * 2011-11-25 2013-06-05 阿里巴巴集团控股有限公司 一种图片搜索方法以及图片搜索装置
CN107908770A (zh) * 2017-11-30 2018-04-13 维沃移动通信有限公司 一种照片搜索方法及移动终端
CN108021654A (zh) * 2017-12-01 2018-05-11 北京奇安信科技有限公司 一种相册图像处理方法及装置
CN108509521B (zh) * 2018-03-12 2020-02-18 华南理工大学 一种自动生成文本索引的图像检索方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103838724A (zh) * 2012-11-20 2014-06-04 百度在线网络技术(北京)有限公司 图像搜索方法及装置
CN107766853A (zh) * 2016-08-16 2018-03-06 阿里巴巴集团控股有限公司 一种图像的文本信息的生成、显示方法及电子设备
CN106446782A (zh) * 2016-08-29 2017-02-22 北京小米移动软件有限公司 图像识别方法及装置
CN106708940A (zh) * 2016-11-11 2017-05-24 百度在线网络技术(北京)有限公司 用于处理图片的方法和装置
WO2018134964A1 (ja) * 2017-01-20 2018-07-26 楽天株式会社 画像検索システム、画像検索方法およびプログラム
CN109635135A (zh) * 2018-11-30 2019-04-16 Oppo广东移动通信有限公司 图像索引生成方法、装置、终端及存储介质

Also Published As

Publication number Publication date
CN109635135A (zh) 2019-04-16

Similar Documents

Publication Publication Date Title
WO2020108234A1 (zh) 图像索引生成方法、图像搜索方法、装置、终端及介质
JP7091504B2 (ja) 顔認識アプリケーションにおけるフォールスポジティブの最小化のための方法および装置
Gu et al. An empirical study of language cnn for image captioning
WO2019154262A1 (zh) 一种图像分类方法及服务器、用户终端、存储介质
CA2804230C (en) A computer-implemented method, a computer program product and a computer system for image processing
US20210271707A1 (en) Joint Visual-Semantic Embedding and Grounding via Multi-Task Training for Image Searching
WO2019214453A1 (zh) 一种内容分享***、方法、标注方法、服务器及终端设备
CN106897372B (zh) 语音查询方法和装置
US10685236B2 (en) Multi-model techniques to generate video metadata
CN111062871A (zh) 一种图像处理方法、装置、计算机设备及可读存储介质
KR102124466B1 (ko) 웹툰 제작을 위한 콘티를 생성하는 장치 및 방법
CN116797684B (zh) 图像生成方法、装置、电子设备及存储介质
WO2020044099A1 (zh) 一种基于对象识别的业务处理方法和装置
WO2023101679A1 (en) Text-image cross-modal retrieval based on virtual word expansion
JP2021535508A (ja) 顔認識において偽陽性を低減するための方法および装置
US20170171471A1 (en) Method and device for generating multimedia picture and an electronic device
JP6046501B2 (ja) 特徴点出力装置、特徴点出力プログラム、特徴点出力方法、検索装置、検索プログラムおよび検索方法
WO2022012205A1 (zh) 词补全方法和装置
KR20230025917A (ko) 여행과 연관된 증강 현실 기반 음성 번역
Panda et al. Heritage app: annotating images on mobile phones
US8994834B2 (en) Capturing photos
JP7483532B2 (ja) キーワード抽出装置、キーワード抽出方法及びキーワード抽出プログラム
WO2014186392A2 (en) Summarizing a photo album
CN117854156B (zh) 一种特征提取模型的训练方法和相关装置
CN109739970A (zh) 信息处理方法及装置、以及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19889402

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19889402

Country of ref document: EP

Kind code of ref document: A1