CN116012841A - Open set image scene matching method and device based on deep learning - Google Patents

Open set image scene matching method and device based on deep learning Download PDF

Info

Publication number
CN116012841A
CN116012841A CN202211633663.8A CN202211633663A CN116012841A CN 116012841 A CN116012841 A CN 116012841A CN 202211633663 A CN202211633663 A CN 202211633663A CN 116012841 A CN116012841 A CN 116012841A
Authority
CN
China
Prior art keywords
deep learning
image
scene
feature vector
training data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211633663.8A
Other languages
Chinese (zh)
Inventor
刘天利
王廷鸟
雷春霞
王松
傅蕴蓉
瞿二平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Dahua Technology Co Ltd
Original Assignee
Zhejiang Dahua Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Dahua Technology Co Ltd filed Critical Zhejiang Dahua Technology Co Ltd
Priority to CN202211633663.8A priority Critical patent/CN116012841A/en
Publication of CN116012841A publication Critical patent/CN116012841A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Image Analysis (AREA)

Abstract

The application relates to a method and a device for matching open set image scenes based on deep learning, wherein the method comprises the steps of constructing a deep learning model of a corresponding feature extraction network; importing training data sets covering different image scenes into a deep learning model to obtain reference feature vectors; importing original image data of an image to be identified into a deep learning model to obtain a sample feature vector corresponding to the image to be identified; and calculating the similarity between the sample feature vector and each reference feature vector, screening out the reference feature vectors meeting the similarity threshold, and taking the image scene corresponding to the reference feature vectors as a target optimization scene. By constructing a training data set capable of covering different image scenes, utilizing a multi-center loss function in a deep learning model, reserving a plurality of centers in fully connected layers, selecting the image scenes which are as close as possible as a target optimization scene in a mode of calculating the similarity of feature vectors, and avoiding the defect that an untrained scene, namely an open-set scene, cannot be identified.

Description

Open set image scene matching method and device based on deep learning
Technical Field
The application belongs to the field of image processing, and particularly relates to an open set image scene matching method and device based on deep learning.
Background
Scene recognition is an important branch of the image processing technology field. In the target detection and tracking process under the complex scene, the corresponding image algorithm and image optimization parameters are selected and selected according to the scene recognition result.
The current common scene recognition and classification method provides a technical scheme for scene recognition by using a deep learning method, but cannot recognize untrained scenes due to the limitation of using training data in the scene recognition process, namely has the defect of difficult open set recognition.
Disclosure of Invention
Accordingly, it is necessary to provide a method and a device for matching open-set image scenes based on deep learning in order to solve the above-mentioned technical problems. Under different scenes, the image scene which is as close as possible is selected as the target optimization scene in a mode of calculating the similarity of the feature vectors, so that the defect that an untrained scene cannot be identified is avoided.
In a first aspect, the present application provides an open-set image scene matching method based on deep learning, where the open-set image scene matching method includes:
selecting a feature extraction network based on a deep learning network structure, and constructing a deep learning model corresponding to the feature extraction network;
importing training data sets covering different image scenes into the deep learning model, and calculating to obtain a reference feature vector corresponding to each image scene;
importing original image data of an image to be identified into the deep learning model, and calculating to obtain a sample feature vector corresponding to the image to be identified;
and calculating the similarity between the sample feature vector and each reference feature vector, screening the reference feature vectors meeting the similarity threshold requirement from the calculation result, and taking the image scene corresponding to the reference feature vector as a target optimization scene.
In one embodiment, the selecting a feature extraction network based on a deep learning network structure, and constructing a deep learning model corresponding to the feature extraction network, includes:
selecting a feature extraction network based on a deep learning network structure;
establishing a multi-center loss function corresponding to the feature extraction network;
and constructing a deep learning model based on the multi-center loss function and the feature extraction network.
In one embodiment, the importing the training data set covering different image scenes into the deep learning model, and calculating to obtain the reference feature vector corresponding to each image scene includes:
generating a training data set covering the corresponding different image scenes;
decomposing the training data set into training data corresponding to each image scene;
and importing the training data into the deep learning model, and calculating to obtain a reference feature vector corresponding to each image scene.
In one embodiment, the generating training data sets that cover different image scenes includes:
training data is selected based on scene features of each image scene,
generating a data set label according to the corresponding relation between the training data and the image scene;
and summarizing the training data containing the data set labels to obtain a training data set.
In one embodiment, the importing the training data into the deep learning model, and calculating to obtain the reference feature vector corresponding to each image scene includes:
importing the training data into the deep learning model, and calculating to obtain feature vectors corresponding to each image in the image scene;
and carrying out mean value solving processing on the feature vectors belonging to each image scene, and taking the obtained result as a reference feature vector of each image scene.
In one embodiment, the calculating the similarity between the sample feature vector and each reference feature vector, screening the reference feature vector meeting the similarity threshold requirement from the calculation result, and taking the image scene corresponding to the reference feature vector as the target optimization scene includes:
calculating the similarity between the sample feature vector and each reference feature vector;
and taking the similarity meeting the similarity threshold requirement as reference similarity, determining the reference feature vector corresponding to the reference similarity, and taking the image scene corresponding to the determined reference feature vector as a target optimization scene.
In one embodiment, the open set image scene matching method further includes:
acquiring an evaluation value of the target optimization scene;
and if the evaluation value is beyond the preset range, performing an operation of generating a calibration feature vector.
In one embodiment, the operation of generating the calibration feature vector further includes:
acquiring a second training data set which is transmitted simultaneously with the evaluation value;
and importing the second training data set into the deep learning model, and calculating to obtain a calibration feature vector.
In a second aspect, the present application further provides an open set image scene matching device based on deep learning, including:
the model construction module is used for selecting a feature extraction network based on a deep learning network structure and constructing a deep learning model corresponding to the feature extraction network;
the first vector calculation module is used for importing training data sets covering different image scenes into the deep learning model and calculating to obtain a reference feature vector corresponding to each image scene;
the second vector calculation module is used for importing the original image data of the image to be identified into the deep learning model and calculating to obtain a sample feature vector corresponding to the image to be identified;
the scene selection module is used for calculating the similarity between the sample feature vector and each reference feature vector, screening the reference feature vectors meeting the similarity threshold requirement from the calculation result, and taking the image scene corresponding to the reference feature vectors as a target optimization scene.
In a third aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor which when executing the computer program performs the steps of:
selecting a feature extraction network based on a deep learning network structure, and constructing a deep learning model corresponding to the feature extraction network;
importing training data sets covering different image scenes into the deep learning model, and calculating to obtain a reference feature vector corresponding to each image scene;
importing original image data of an image to be identified into the deep learning model, and calculating to obtain a sample feature vector corresponding to the image to be identified;
and calculating the similarity between the sample feature vector and each reference feature vector, screening the reference feature vectors meeting the similarity threshold requirement from the calculation result, and taking the image scene corresponding to the reference feature vector as a target optimization scene.
In a fourth aspect, the present application also provides a computer-readable storage medium. The computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of: selecting a feature extraction network based on a deep learning network structure, and constructing a deep learning model corresponding to the feature extraction network;
importing training data sets covering different image scenes into the deep learning model, and calculating to obtain a reference feature vector corresponding to each image scene;
importing original image data of an image to be identified into the deep learning model, and calculating to obtain a sample feature vector corresponding to the image to be identified;
and calculating the similarity between the sample feature vector and each reference feature vector, screening the reference feature vectors meeting the similarity threshold requirement from the calculation result, and taking the image scene corresponding to the reference feature vector as a target optimization scene.
In a fifth aspect, the present application also provides a computer program product. The computer program product comprises a computer program which, when executed by a processor, implements the steps of:
selecting a feature extraction network based on a deep learning network structure, and constructing a deep learning model corresponding to the feature extraction network;
importing training data sets covering different image scenes into the deep learning model, and calculating to obtain a reference feature vector corresponding to each image scene;
importing original image data of an image to be identified into the deep learning model, and calculating to obtain a sample feature vector corresponding to the image to be identified;
and calculating the similarity between the sample feature vector and each reference feature vector, screening the reference feature vectors meeting the similarity threshold requirement from the calculation result, and taking the image scene corresponding to the reference feature vector as a target optimization scene.
According to the open set image scene matching method, device, computer equipment, storage medium and computer program product based on deep learning, through constructing a training data set capable of covering different image scenes, multiple centers are reserved in fully connected layers by utilizing a multi-center loss function in a deep learning model, and the image scenes which are as close as possible are selected as target optimization scenes in a mode of calculating feature vector similarity, so that the defect that untrained scenes cannot be identified is avoided.
Drawings
FIG. 1 is an application environment diagram of an open set image scene matching method based on deep learning in one embodiment;
FIG. 2 is a flow diagram of an open set image scene matching method based on deep learning in one embodiment;
FIG. 3 is a detailed flow chart of an open set image scene matching method based on deep learning in another embodiment;
FIG. 4 is a block diagram of an open set image scene matching device based on deep learning in one embodiment;
fig. 5 is an internal structural diagram of a computer device in one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.
The open set image scene matching method based on the deep learning can be applied to an application environment shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. The data storage system may store data that the server 104 needs to process. The data storage system may be integrated on the server 104 or may be located on a cloud or other network server. The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, internet of things devices, and portable wearable devices, where the internet of things devices may be smart speakers, smart televisions, smart air conditioners, smart vehicle devices, and the like. The portable wearable device may be a smart watch, smart bracelet, headset, or the like. The server 104 may be implemented as a stand-alone server or as a server cluster of multiple servers.
In one embodiment, as shown in fig. 2, there is provided an open-set image scene matching method based on deep learning, and the method is applied to the terminal in fig. 1 for illustration, and includes the following steps:
and S20, selecting a feature extraction network based on a deep learning network structure, and constructing a deep learning model of the corresponding feature extraction network.
The feature extraction network based on the deep learning network structure is typically a backhaul, and other networks such as MobileNet, inception, VGGNet may be selected according to different requirements. The deep learning model obtained by executing the step includes, besides the obtained feature extraction network, a multi-center loss function corresponding to the feature extraction network, where the multi-center is a key parameter corresponding to a plurality of different image scenes, and a specific construction process of the function is described in detail later, which is not repeated here.
Step S40, a training data set covering different image scenes is imported into a deep learning model, and a reference feature vector corresponding to each image scene is obtained through calculation.
The calculation of the feature vector of each image scene can be completed based on the obtained deep learning model. The training data set used as the training deep learning model and covering different scene images is imported into the deep learning model, so that a plurality of feature vectors corresponding to each image scene output by the deep learning model can be obtained, and further, the plurality of feature vectors under each image scene are processed to obtain a reference feature vector corresponding to each image scene, wherein the reference feature vector is used for carrying out similarity calculation with the feature vector of the image to be identified in the subsequent step, and confirmation of the image scene close to the image to be identified is completed.
Step S60, the original image data of the image to be identified is imported into a deep learning model, and a sample feature vector corresponding to the image to be identified is obtained through calculation.
Similar to the execution concept of step S40, the original image data of the image to be identified is also imported into the obtained deep learning model, so as to obtain the sample feature vector corresponding to the image to be identified, which is output by the deep learning model. The sample feature vector is used for carrying out similarity calculation by combining the reference feature vector corresponding to each image scene obtained in the previous step, so that the sample feature vector is subjected to similarity calculation with the feature vector of the image to be identified in the subsequent step, and confirmation of the image scene close to the image to be identified is completed.
Step S80, calculating the similarity between the sample feature vector and each reference feature vector, screening the reference feature vectors meeting the similarity threshold requirement from the calculation result, and taking the image scene corresponding to the reference feature vectors as a target optimization scene.
The similarity between the sample feature vector and each reference feature vector is calculated respectively for the reference feature vectors corresponding to different image scenes obtained in the previous step, the reference feature vector meeting the similarity threshold requirement is selected, and the image scene corresponding to the selected reference feature vector is used as a target optimization scene of the image to be identified. And if a plurality of image scenes are selected, carrying out subsequent optimization processing on the image to be identified by taking the obtained plurality of image scenes as target optimization scenes.
According to the open-set image scene matching method based on the deep learning and composed by the technical steps, one or more target optimization scenes corresponding to the image to be identified can be determined based on the similarity calculation mode of the feature vectors, and optimization processing of the image to be identified is facilitated according to the obtained target optimization scenes in subsequent operation. Compared with the scheme that only one target optimization scene can be obtained in the prior art, the extraction of the feature center and the feature vector difference calculation are increased, the adaptation range of obtaining the target optimization scene can be enlarged, and the method has a good adaptation effect especially for an untrained image scene.
In one embodiment, selecting a feature extraction network based on a deep learning network structure, and constructing a deep learning model of the corresponding feature extraction network, namely, step S20, includes:
step S22, selecting a feature extraction network based on a deep learning network structure;
step S24, establishing a multi-center loss function of the corresponding feature extraction network;
and S26, constructing a deep learning model based on the multi-center loss function and the feature extraction network.
In implementation, based on the description of step S20, the deep learning model to be constructed includes a feature extraction network and a multi-center loss function suitable for computing feature vectors of various image scenes.
The feature extraction network is selected from the existing backbone, mobileNet, inception, VGGNet mature network structures and the like to meet requirements. And the specific content of the multi-center loss function of the corresponding feature extraction network is established as shown in a formula I:
Figure BDA0004006765130000071
the multicenter loss function given by equation one is a SoftTriple loss function, where l SoftTriple (x i ) As a multi-center loss function, x i Extracting a vector, parameters for the features of the ith sample
Figure BDA0004006765130000072
Then the specific manner in which the slack similarity is calculated for determining the slack similarity between the example and the particular class is as shown in equation two:
Figure BDA0004006765130000073
in the above formula, x i Feature extraction vector representing the i-th sample, [ w ] 1 ,w 2 ...w c ]∈R d*c Is the last fully connected layer, c represents the number of classes, d is the dimension of ebedding, k represents k centers per class (scene), example x i Similarity with class C is S i ' ,c Delta is a predeterminedA spacing is defined, λ, γ being the scaling factor.
In one embodiment, the training data set covering different image scenes is imported into a deep learning model, and a reference feature vector corresponding to each image scene is calculated, that is, step S40 includes:
step S42, generating a training data set which covers different image scenes;
step S44, decomposing the training data set into training data corresponding to each image scene;
step S46, training data is imported into the deep learning model, and reference feature vectors corresponding to each image scene are obtained through calculation.
In order to obtain the reference feature vector for each image scene, firstly, a training data set capable of covering different image scenes is required to be constructed, secondly, the obtained training data set is divided into training data of each image scene, and finally, the obtained training data is imported into an existing deep learning model and calculated to obtain the reference feature vector corresponding to each image scene.
Wherein, generating a training data set covering different image scenes, step S42, includes:
step S422, selecting training data based on scene features of each image scene;
step S424, generating a data set label according to the corresponding relation between the training data and the image scene;
in step S426, training data including data set labels is summarized to obtain a training data set.
In practice, to construct a training data set, first, scene features corresponding to each image scene need to be acquired, and then training data is selected based on the determined scene features. The scene features specifically comprise illumination intensity, dynamic range, typical pixel values and the like, and training data corresponding to the scene features are preferentially selected.
It should be noted that the training data here contains a plurality of images.
And then determining the data set labels belonging to each image scene according to the corresponding relation between the training data and the image scenes, so that the training data is summarized based on the data set labels, and a training data set for importing a deep learning model is obtained.
The training data is imported into the deep learning model, and a reference feature vector corresponding to each image scene is calculated, that is, step S46 includes:
step S462, training data is imported into a deep learning model, and feature vectors of each image in the corresponding image scene are obtained through calculation;
in step S464, the feature vectors belonging to each image scene are averaged, and the obtained result is used as the reference feature vector of each image scene.
In implementation, after training data is imported into the deep learning model, feature vectors corresponding to each image scene, namely each image under each dataset label, are obtained through calculation based on multi-center loss functions in the deep learning model. Since the number of feature vectors in each image scene is too large, if the subsequent similarity calculation is directly performed according to the obtained feature vectors, the operation amount and operation time are greatly increased, so that the operation of step S464 is also required to be performed, and the average value of all the feature vectors in each image scene is obtained in a manner of obtaining the average value and is used as the reference feature vector in the corresponding current image scene.
In one embodiment, the similarity between the sample feature vector and each reference feature vector is calculated, the reference feature vector meeting the similarity threshold requirement is screened from the calculation result, and the image scene corresponding to the reference feature vector is used as the target optimization scene, namely, step S80 includes:
step S82, calculating the similarity between the sample feature vector and each reference feature vector;
and S84, taking the similarity meeting the similarity threshold requirement as reference similarity, determining a reference feature vector corresponding to the reference similarity, and taking the image scene corresponding to the determined reference feature vector as a target optimization scene.
In practice, in order to determine an image scene matching the image to be identified, a determination needs to be made in combination with the similarity result between the sample feature vector obtained in the previous step and each reference feature vector.
First, the similarity between a sample feature vector and a single reference feature vector of a plurality of reference feature vectors is calculated based on a mature similarity calculation method. If the similarity is greater than a given similarity threshold, then the current image scene is determined to be a matching scene.
Typically as shown in equation three:
Figure BDA0004006765130000091
the third formula is to calculate the similarity contrast by using cosine similarity, wherein Sim is a similarity calculator, x i 、y i Representing the feature vector of the image to be identified and the feature vector of a scene of a certain category respectively, and d represents the dimension of the vector.
The similarity threshold value used for screening the similarity result is an adjustable parameter, and is generally set to be 0.8, when the similarity between the image to be identified and the characteristics of a certain scene is greater than 0.8, the image is judged to be the scene, and when special requirements exist, for example, an algorithm is required to be sensitive to a backlight scene, the parameter is adjusted to be small. Based on the judging principle, a certain scene can be simultaneously judged to be a plurality of scene categories by adjusting the value of the similarity threshold, so that the aim of matching multiple images is fulfilled.
In one embodiment, the open set image scene matching method further includes:
step S92, obtaining an evaluation value of a target optimization scene;
in step S94, if the evaluation value exceeds the preset range, an operation of generating a calibration feature vector is performed.
In practical application, it is found that the discrimination of different testers and users on the same scene is quite different, for example, a certain backlight scene has different requirements for two users, one is considered to be free from starting a wide dynamic technology, and the other is considered to be required to be started. Because the subjectivity problem of user evaluation cannot be solved by simple sensitivity threshold adjustment, a feedback mechanism is provided in the step, if an evaluation value which is considered to be a great difference between a matched target optimization scene and an expected one is received, less scene discrimination image data and labels which are transmitted simultaneously with the evaluation value are only required to be acquired, so that a calibration feature vector which accords with a subjective discrimination criterion is obtained, and a great number of training data sets are not required to be manufactured again.
After determining the target optimization scene corresponding to the image to be identified, the evaluation value of the target optimization scene can be obtained. The evaluation value is used for quantitatively describing the matching degree between the target optimization scene and the image to be identified. In order to accurately quantify the evaluation value, a preset range of the corresponding evaluation value needs to be established.
If the evaluation value is within the preset range, the matching degree of the target optimization scene symbol determined based on the previous steps can be determined to be satisfactory; if the evaluation value exceeds the preset range, the fact that the determined target optimization scene has a large difference from the scene corresponding relation of the image to be identified is indicated, and the operation of generating the calibration feature vector is needed to be executed.
The calibration feature vector is a group of scene images marked by user judgment standards are acquired again on the premise of dissatisfaction with the obtained target optimization scene. And then re-performs step S40 to obtain a new reference feature vector.
The operation of generating the calibration feature vector, that is, step S94 further includes:
step S942, obtaining a second training data set which is transmitted simultaneously with the evaluation value;
in step S944, the second training data set is imported into the deep learning model, and the calibration feature vector is calculated.
In the implementation, if the evaluation value exceeds the preset range, the second training data set transmitted simultaneously with the evaluation value is received, and the specific operation in step S46 is performed on the second training data set, so as to obtain the calibration feature vector corresponding to the second training data set. And further calculating the similarity between the calibration feature vector and the calibration feature vector, and executing the target optimization scene screening method based on the similarity threshold requirement again according to the similarity.
It is worth noting that if the data amount of the target optimization scene obtained by screening is too large, the similarity threshold can be properly adjusted, so that the specific number of the target optimization scenes is realized, and the targeted optimization processing effect of the image to be identified based on the target optimization scenes is ensured to meet the expectations.
Based on the foregoing technical solutions, a detailed embodiment of performing object scene matching on an image to be identified is provided herein, as shown in fig. 3, and specifically includes:
and S1, selecting a feature extraction network based on a deep learning network structure.
And S2, establishing a multi-center loss function of the corresponding feature extraction network.
And S3, constructing a deep learning model based on the multi-center loss function and the feature extraction network.
And S4, generating a training data set which covers different image scenes.
And S5, decomposing the training data set into training data corresponding to each image scene.
And S6, importing training data into a deep learning model, and calculating to obtain a reference feature vector corresponding to each image scene.
And S7, importing the original image data of the image to be identified into a deep learning model, and calculating to obtain a sample feature vector corresponding to the image to be identified.
Step S8, calculating the similarity between the sample feature vector and each reference feature vector.
And S9, taking the similarity meeting the similarity threshold requirement as the reference similarity.
Step S10, determining a reference feature vector corresponding to the reference similarity, and taking an image scene corresponding to the determined reference feature vector as a target optimization scene.
Step S11, obtaining an evaluation value of the target optimization scene.
Step S12, if the evaluation value exceeds the preset range, an operation of generating a calibration feature vector is performed.
It should be understood that, although the steps in the flowcharts related to the above embodiments are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.
Based on the same inventive concept, the embodiment of the application also provides an open set image scene matching device for realizing the open set image scene matching method. The implementation scheme of the open set image scene matching device for solving the problem is similar to that described in the above method, so the specific limitation in the embodiments of the open set image scene matching device or devices provided below can be referred to the limitation of the open set image scene matching method in the above description, and the description is omitted here.
In one embodiment, as shown in fig. 4, there is provided an open set image scene matching apparatus 400, comprising: a model building module 420, a first vector calculation module 440, a second vector calculation module 460, and a scene selection module 480, wherein:
the model construction module 420 is configured to select a feature extraction network based on a deep learning network structure, and construct a deep learning model corresponding to the feature extraction network;
the feature extraction network based on the deep learning network structure is typically a backhaul, and other networks such as MobileNet, inception, VGGNet may be selected according to different requirements. The deep learning model obtained by executing the step includes, besides the obtained feature extraction network, a multi-center loss function corresponding to the feature extraction network, where the multi-center is a key parameter corresponding to a plurality of different image scenes, and a specific construction process of the function is described in detail later, which is not repeated here.
A first vector calculation module 440, configured to import training data sets covering different image scenes into the deep learning model, and calculate a reference feature vector corresponding to each image scene;
the calculation of the feature vector of each image scene can be completed based on the obtained deep learning model. The training data set used as the training deep learning model and covering different scene images is imported into the deep learning model, so that a plurality of feature vectors corresponding to each image scene output by the deep learning model can be obtained, and further, the plurality of feature vectors under each image scene are processed to obtain a reference feature vector corresponding to each image scene, wherein the reference feature vector is used for carrying out similarity calculation with the feature vector of the image to be identified in the subsequent step, and confirmation of the image scene close to the image to be identified is completed.
The second vector calculation module 460 is configured to import the original image data of the image to be identified into the deep learning model, and calculate a sample feature vector corresponding to the image to be identified;
similar to the execution concept of step S40, the original image data of the image to be identified is also imported into the obtained deep learning model, so as to obtain the sample feature vector corresponding to the image to be identified, which is output by the deep learning model. The sample feature vector is used for carrying out similarity calculation by combining the reference feature vector corresponding to each image scene obtained in the previous step, so that the sample feature vector is subjected to similarity calculation with the feature vector of the image to be identified in the subsequent step, and confirmation of the image scene close to the image to be identified is completed.
The scene selection module 480 is configured to calculate the similarity between the sample feature vector and each of the reference feature vectors, screen the reference feature vectors meeting the similarity threshold requirement from the calculation result, and use the image scene corresponding to the reference feature vector as the target optimization scene.
The similarity between the sample feature vector and each reference feature vector is calculated respectively for the reference feature vectors corresponding to different image scenes obtained in the previous step, the reference feature vector meeting the similarity threshold requirement is selected, and the image scene corresponding to the selected reference feature vector is used as a target optimization scene of the image to be identified. And if a plurality of image scenes are selected, carrying out subsequent optimization processing on the image to be identified by taking the obtained plurality of image scenes as target optimization scenes.
According to the open-set image scene matching device method based on deep learning and composed of the modules, one or more target optimization scenes corresponding to the image to be identified can be determined based on the similarity calculation mode of the feature vectors, and optimization processing of the image to be identified is facilitated according to the obtained target optimization scenes in subsequent operation. Compared with the scheme that only one target optimization scene can be obtained in the prior art, the extraction of the feature center and the feature vector difference calculation are increased, the adaptation range of obtaining the target optimization scene can be enlarged, and the method has a good adaptation effect especially for an untrained image scene.
The respective modules in the open set image scene matching device 400 described above may be implemented in whole or in part by software, hardware, and a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 5. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used for storing data of an open set image scene matching method based on deep learning. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements a method for open-set image scene matching based on deep learning.
It will be appreciated by those skilled in the art that the structure shown in fig. 5 is merely a block diagram of some of the structures associated with the present application and is not limiting of the computer device to which the present application may be applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer device is provided comprising a memory and a processor, the memory having stored therein a computer program, the processor when executing the computer program performing the steps of:
and S20, selecting a feature extraction network based on a deep learning network structure, and constructing a deep learning model of the corresponding feature extraction network.
Step S40, a training data set covering different image scenes is imported into a deep learning model, and a reference feature vector corresponding to each image scene is obtained through calculation.
Step S60, the original image data of the image to be identified is imported into a deep learning model, and a sample feature vector corresponding to the image to be identified is obtained through calculation.
Step S80, calculating the similarity between the sample feature vector and each reference feature vector, screening the reference feature vectors meeting the similarity threshold requirement from the calculation result, and taking the image scene corresponding to the reference feature vectors as a target optimization scene.
In one embodiment, a computer readable storage medium is provided having a computer program stored thereon, which when executed by a processor, performs the steps of:
and S20, selecting a feature extraction network based on a deep learning network structure, and constructing a deep learning model of the corresponding feature extraction network.
Step S40, a training data set covering different image scenes is imported into a deep learning model, and a reference feature vector corresponding to each image scene is obtained through calculation.
Step S60, the original image data of the image to be identified is imported into a deep learning model, and a sample feature vector corresponding to the image to be identified is obtained through calculation.
Step S80, calculating the similarity between the sample feature vector and each reference feature vector, screening the reference feature vectors meeting the similarity threshold requirement from the calculation result, and taking the image scene corresponding to the reference feature vectors as a target optimization scene.
In one embodiment, a computer program product is provided comprising a computer program which, when executed by a processor, performs the steps of:
and S20, selecting a feature extraction network based on a deep learning network structure, and constructing a deep learning model of the corresponding feature extraction network.
Step S40, a training data set covering different image scenes is imported into a deep learning model, and a reference feature vector corresponding to each image scene is obtained through calculation.
Step S60, the original image data of the image to be identified is imported into a deep learning model, and a sample feature vector corresponding to the image to be identified is obtained through calculation.
Step S80, calculating the similarity between the sample feature vector and each reference feature vector, screening the reference feature vectors meeting the similarity threshold requirement from the calculation result, and taking the image scene corresponding to the reference feature vectors as a target optimization scene.
It should be noted that, user information (including but not limited to user equipment information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, presented data, etc.) referred to in the present application are information and data authorized by the user or sufficiently authorized by each party.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the various embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the various embodiments provided herein may include at least one of relational databases and non-relational databases. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic units, quantum computing-based data processing logic units, etc., without being limited thereto.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples only represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the present application. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application shall be subject to the appended claims.

Claims (10)

1. The open set image scene matching method based on the deep learning is characterized by comprising the following steps of:
selecting a feature extraction network based on a deep learning network structure, and constructing a deep learning model corresponding to the feature extraction network;
importing training data sets covering different image scenes into the deep learning model, and calculating to obtain a reference feature vector corresponding to each image scene;
importing original image data of an image to be identified into the deep learning model, and calculating to obtain a sample feature vector corresponding to the image to be identified;
and calculating the similarity between the sample feature vector and each reference feature vector, screening the reference feature vectors meeting the similarity threshold requirement from the calculation result, and taking the image scene corresponding to the reference feature vector as a target optimization scene.
2. The method for matching an open set image scene based on deep learning according to claim 1, wherein said selecting a feature extraction network based on a deep learning network structure, constructing a deep learning model corresponding to the feature extraction network, comprises:
selecting a feature extraction network based on a deep learning network structure;
establishing a multi-center loss function corresponding to the feature extraction network;
and constructing a deep learning model based on the multi-center loss function and the feature extraction network.
3. The method for matching open-set image scenes based on deep learning according to claim 1, wherein said importing training data sets covering different image scenes into said deep learning model, calculating a reference feature vector corresponding to each of said image scenes, comprises:
generating a training data set covering the corresponding different image scenes;
decomposing the training data set into training data corresponding to each image scene;
and importing the training data into the deep learning model, and calculating to obtain a reference feature vector corresponding to each image scene.
4. The deep learning based open set image scene matching method of claim 3, wherein said generating training data sets covering corresponding different image scenes comprises:
training data is selected based on scene features of each image scene,
generating a data set label according to the corresponding relation between the training data and the image scene;
and summarizing the training data containing the data set labels to obtain a training data set.
5. The method for matching an open set image scene based on deep learning according to claim 3, wherein said importing the training data into the deep learning model, and calculating the reference feature vector corresponding to each image scene, comprises:
importing the training data into the deep learning model, and calculating to obtain feature vectors corresponding to each image in the image scene;
and carrying out mean value solving processing on the feature vectors belonging to each image scene, and taking the obtained result as a reference feature vector of each image scene.
6. The deep learning-based open-set image scene matching method according to claim 1, wherein the calculating the similarity between the sample feature vector and each of the reference feature vectors, screening the reference feature vectors meeting a similarity threshold requirement from the calculation result, and taking the image scene corresponding to the reference feature vectors as a target optimization scene comprises:
calculating the similarity between the sample feature vector and each reference feature vector;
and taking the similarity meeting the similarity threshold requirement as reference similarity, determining the reference feature vector corresponding to the reference similarity, and taking the image scene corresponding to the determined reference feature vector as a target optimization scene.
7. The open-set image scene matching method based on deep learning according to claim 1, further comprising:
acquiring an evaluation value of the target optimization scene;
and if the evaluation value is beyond the preset range, performing an operation of generating a calibration feature vector.
8. The deep learning based open set image scene matching method of claim 7, wherein said operation of generating calibration feature vectors further comprises:
acquiring a second training data set which is transmitted simultaneously with the evaluation value;
and importing the second training data set into the deep learning model, and calculating to obtain a calibration feature vector.
9. An open-set image scene matching device based on deep learning, which is characterized by comprising:
the model construction module is used for selecting a feature extraction network based on a deep learning network structure and constructing a deep learning model corresponding to the feature extraction network;
the first vector calculation module is used for importing training data sets covering different image scenes into the deep learning model and calculating to obtain a reference feature vector corresponding to each image scene;
the second vector calculation module is used for importing the original image data of the image to be identified into the deep learning model and calculating to obtain a sample feature vector corresponding to the image to be identified;
the scene selection module is used for calculating the similarity between the sample feature vector and each reference feature vector, screening the reference feature vectors meeting the similarity threshold requirement from the calculation result, and taking the image scene corresponding to the reference feature vectors as a target optimization scene.
10. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the deep learning based open set image scene matching method of any one of claims 1 to 8 when the computer program is executed.
CN202211633663.8A 2022-12-19 2022-12-19 Open set image scene matching method and device based on deep learning Pending CN116012841A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211633663.8A CN116012841A (en) 2022-12-19 2022-12-19 Open set image scene matching method and device based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211633663.8A CN116012841A (en) 2022-12-19 2022-12-19 Open set image scene matching method and device based on deep learning

Publications (1)

Publication Number Publication Date
CN116012841A true CN116012841A (en) 2023-04-25

Family

ID=86034742

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211633663.8A Pending CN116012841A (en) 2022-12-19 2022-12-19 Open set image scene matching method and device based on deep learning

Country Status (1)

Country Link
CN (1) CN116012841A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116522501A (en) * 2023-05-05 2023-08-01 中国船级社上海规范研究所 Real ship verification system based on safe return port

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116522501A (en) * 2023-05-05 2023-08-01 中国船级社上海规范研究所 Real ship verification system based on safe return port
CN116522501B (en) * 2023-05-05 2024-02-13 中国船级社上海规范研究所 Real ship verification system based on safe return port

Similar Documents

Publication Publication Date Title
Jia et al. Saliency-based deep convolutional neural network for no-reference image quality assessment
CN111738243B (en) Method, device and equipment for selecting face image and storage medium
CN111325271B (en) Image classification method and device
US9292911B2 (en) Automatic image adjustment parameter correction
WO2023206944A1 (en) Semantic segmentation method and apparatus, computer device, and storage medium
CN116580257A (en) Feature fusion model training and sample retrieval method and device and computer equipment
CN112232397A (en) Knowledge distillation method and device of image classification model and computer equipment
CN116012841A (en) Open set image scene matching method and device based on deep learning
Lee et al. Property-specific aesthetic assessment with unsupervised aesthetic property discovery
CN116630630B (en) Semantic segmentation method, semantic segmentation device, computer equipment and computer readable storage medium
WO2024041108A1 (en) Image correction model training method and apparatus, image correction method and apparatus, and computer device
CN111898544A (en) Character and image matching method, device and equipment and computer storage medium
TWI803243B (en) Method for expanding images, computer device and storage medium
CN116739938A (en) Tone mapping method and device and display equipment
CN112052863B (en) Image detection method and device, computer storage medium and electronic equipment
CN114463764A (en) Table line detection method and device, computer equipment and storage medium
CN109583512B (en) Image processing method, device and system
CN115761239B (en) Semantic segmentation method and related device
CN116630629B (en) Domain adaptation-based semantic segmentation method, device, equipment and storage medium
CN117235584B (en) Picture data classification method, device, electronic device and storage medium
WO2022141092A1 (en) Model generation method and apparatus, image processing method and apparatus, and readable storage medium
CN117670686A (en) Video frame enhancement method, device, computer equipment and storage medium
CN117440179A (en) Video updating method, device, computer equipment and storage medium
CN117670726A (en) Image enhancement method, device, computer equipment and storage medium
CN116805292A (en) Tone mapping method and device and display equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination