CN112115290B

CN112115290B - VR panorama scheme matching method based on image intelligent retrieval

Info

Publication number: CN112115290B
Application number: CN202010809509.6A
Authority: CN
Inventors: 万倩倩; 周兵; 王庆利; 苏亮亮
Original assignee: Nanjing Weilijia Intelligent Technology Co ltd; Nanjing Zhishan Intelligent Science And Technology Research Institute Co ltd
Current assignee: Nanjing Weilijia Intelligent Technology Co ltd; Nanjing Zhishan Intelligent Science And Technology Research Institute Co ltd
Priority date: 2020-08-12
Filing date: 2020-08-12
Publication date: 2023-11-10
Anticipated expiration: 2040-08-12
Also published as: CN112115290A

Abstract

The application discloses a VR panorama scheme matching method based on image intelligent retrieval, which comprises the following steps: constructing a VR panorama scheme database, wherein VR panorama scheme links, inter-function labels and inter-function picture feature vectors are stored in the VR panorama scheme database, and the VR panorama scheme links, the inter-function labels and the inter-function picture feature vectors are mutually associated; carrying out extraction of inter-function labels and inter-function picture feature vectors on an input image to be retrieved by using a convolutional neural network model; and carrying out tag retrieval on the VR panorama scheme database by utilizing the inter-function tags and the inter-function picture feature vectors to find a matched VR panorama scheme. According to the VR panorama scheme matching method based on the image intelligent retrieval, similar schemes can be quickly and accurately matched in a massive VR panorama scheme library through one indoor effect diagram.

Description

VR panorama scheme matching method based on image intelligent retrieval

Technical Field

The application relates to a VR panorama scheme matching method, in particular to a VR panorama scheme matching method based on image intelligent retrieval.

Background

The combination of the virtual reality technology and the indoor design enables the indoor design to be displayed to a user in a brand new posture. The computer simulates indoor space and environment, shows graceful color collocation, and soft and warm lamplight, and forms a gorgeous VR panorama scheme regardless of the spatial distribution of a lattice and a long-lasting artistic form. The injection of artificial intelligence has brought new power for indoor design field for indoor design becomes more convenient, more intelligent, more high-efficient. Deep learning in recent years is more remarkable in indoor design field, indoor three-dimensional scene recognition and indoor model retrieval are all advancing towards more intelligent targets.

With the penetration of internet plus business models, traditional designers and users face-to-face with models of recommended designs are gradually replaced by online models. More and more indoor design companies then upload the designed VR panorama onto a website or applet for enjoyment by the user. The scheme is displayed mainly in a panoramic roaming mode, and the pictures are auxiliary. The user can roam in the indoor panoramic scheme without going home, such as being in the scene.

New business models must have some resistance to emerging. Through research on the indoor design field, a large number of VR panorama schemes are designed by many indoor design companies for users to select in order to meet the demands of users. In this bulky solution, it is difficult and heavy for the user to choose a set of his own favorite stool like a sea fishing needle. Thus, the indoor designer divides the panorama scheme into Chinese style, european style, american style and the like, and the user searches in text form.

Due to the lack of cognition of users on indoor design professions, the blurring of color, space and artistic collocation perception and the limitation of text expression, users cannot quickly and accurately select favorite VR panorama schemes in a large number of schemes through text. If the user only has one favorite indoor effect graph, matching with the indoor effect graph can not be realized in a massive panoramic scheme. For a rendered indoor scene effect diagram, manual mode is adopted at present to mark the indoor scene effect diagram according to functions, so that a great deal of manpower is wasted, the efficiency is low, and the intelligence is lacking.

These problems seriously affect the user's choice of VR panorama scheme, affecting the efficiency of effect map categorization. Traditional panoramic scheme selects, and the mode that effect map was categorized can't satisfy the development needs in indoor design field, can't satisfy more intelligent, more efficient user experience. Many indoor design companies and indoor design professionals are urgent to need a VR panorama matching system based on an effect map. Therefore, it is necessary to design a VR panorama scheme matching method based on image intelligent retrieval, which can extract picture information to be intelligently and rapidly retrieved in a panorama scheme library, and can automatically classify input pictures according to functions.

Disclosure of Invention

The application aims to: the VR panorama scheme matching method based on the image intelligent retrieval can extract picture information to be retrieved intelligently and rapidly in a panorama scheme library, and meanwhile input pictures can be automatically classified according to functions.

The technical scheme is as follows: the application discloses a VR panorama scheme matching method based on image intelligent retrieval, which comprises the following steps:

step 1, constructing a VR panorama scheme database, wherein VR panorama scheme links, inter-function labels and inter-function picture feature vectors are stored in the VR panorama scheme database, and the VR panorama scheme links, the inter-function labels and the inter-function picture feature vectors are mutually associated;

step 2, extracting inter-function labels and inter-function picture feature vectors of the input image to be retrieved by using a convolutional neural network model;

and 3, carrying out tag retrieval on the VR panorama scheme database by utilizing the inter-function tags and the inter-function picture feature vectors to find a matched VR panorama scheme.

Further, in step 1, the specific steps of constructing the VR panorama scheme database are as follows:

step 1.1, a scheme table and a plurality of inter-function tables are arranged in a VR panorama scheme database, the scheme table is used for storing each VR panorama scheme link and corresponding scheme id, each inter-function table is used for respectively storing each inter-function picture feature vector and corresponding scheme id, inter-function labels correspond to the inter-function tables, and the VR panorama scheme links, the inter-function labels and the inter-function picture feature vectors are mutually related through the scheme id;

step 1.2, obtaining scheme data from an existing VR panorama scheme, wherein the scheme data comprises VR panorama scheme links, scheme ids, inter-function labels and inter-function picture feature vectors;

and 1.3, correspondingly storing the obtained scheme data in a scheme table and a function-to-function table, thereby establishing a VR panorama scheme database.

Further, in step 2, the specific steps of extracting the inter-function labels and the inter-function picture feature vectors by using the convolutional neural network model are as follows:

step 2.1, constructing a convolutional neural network which sequentially comprises a CONV1 layer, a Max1 layer, a CONV2 layer, a Max2 layer, a CONV3 layer, a Max3 layer, a flat spreading layer, an FC1 layer, an FC2 layer, an FC3 layer and a soft Max classifier;

step 2.2, training the constructed convolutional neural network by using sample data, and training the convolutional neural network by using a Batch training mode with a Batch-size of 64; and during training, data are transmitted into the network model, the output value of the network is calculated layer by layer, and finally, the gradient descent algorithm is utilized to update parameters so that the network model approaches to the optimal solution.

Step 2.3, optimizing the convolutional neural network model by using a Dropout and ELU activation function;

and 2.4, extracting inter-function labels and inter-function picture feature vectors of the input image to be retrieved by utilizing the optimized convolutional neural network.

Further, in step 2.1, the input of the convolutional neural network is a 224×224×3 digital matrix, the CONV1 layer, the CONV2 layer and the CONV3 layer represent the current change as the convolutional layer change, the Max1 layer, the Max2 layer and the Max3 layer represent the Max pooling layer, the FC1 layer, the FC2 layer and the FC3 layer represent the fully connected layer, and the flat layer is an intermediate layer for expanding the convolutional layer into the fully connected layer.

In step 2.3, a dropoff layer is added between the FC1 layer and the FC2 layer and between the FC2 layer and the FC3 layer, and the dropoff parameter is set to 0.3.

Further, in step 2.3, when the convolutional neural network model is optimized by using the ELU activation function, the ELU activation function is applied to the CONV1 layer, the CONV2 layer and the CONV3 layer, and the ELU activation function has the following function formula:

where α is an adjustable parameter that controls when the negative part of the ELU activation function reaches saturation, where x represents the function input and f (x) represents the activation output.

Further, in step 2.1, the Softmax classifier is an improved Softmax classifier, and the calculation formula of the improved Softmax classifier is as follows:

where V represents an input vector, i represents a position index corresponding to the input vector, length represents a dimension of the input vector, and S represents a corresponding probability result.

Further, in step 2.4, when the optimized convolutional neural network is used to extract the inter-function labels and the inter-function picture feature vectors of the input image to be retrieved, the improved Softmax classifier outputs the inter-function labels, and the output data of the FC1 layer is intercepted as the inter-function picture feature vectors.

Further, in step 3, the specific steps of performing tag search in the VR panorama scheme database by using inter-function tags and inter-function picture feature vectors are as follows:

step 3.1, judging whether the input image to be searched is an effective inter-function image through the inter-function label, if so, entering step 3.2, if not, prompting to input the image to be searched again, and returning to step 2;

step 3.2, searching in the VR panorama scheme database by utilizing the inter-function label of the image to be searched, searching for the corresponding inter-function label of the image to be searched in the VR panorama scheme database, and finding out the corresponding inter-function table in the VR panorama scheme database according to the inter-function label obtained by searching;

step 3.3, obtaining the similarity between the feature vector of the image to be searched and each inter-function picture feature vector stored in the inter-function table, wherein a similarity calculation formula is as follows:

where d is Euclidean distance, s is similarity, h and q are two input vectors, and i is the index of the input vector.

Step 3.4, sorting the sizes of the similarity calculation results, selecting the first N inter-function picture feature vectors with larger similarity, and storing scheme ids corresponding to the inter-function picture feature vectors;

and 3.5, finding a corresponding VR panorama scheme link in the scheme table according to the saved scheme id, and finding each matched VR panorama scheme according to the VR panorama scheme link.

Further, in step 3.3, before performing similarity calculation, the feature vector of the image to be searched and the feature vector of the inter-function picture in the inter-function table are subjected to dimension reduction processing by using a PCA algorithm, so that the feature vector of the image to be searched and the feature vector of the inter-function picture in the inter-function table are reduced to 256 dimensions.

Compared with the prior art, the application has the beneficial effects that: the convolutional neural network model suitable for the effect graph is designed and trained, the functional labels can be extracted through the convolutional neural network model, and the feature vectors of the pictures can be extracted, so that when the convolutional neural network model is used for classification, the classification accuracy rate of the images reaches over 96 percent; the panoramic scheme similar to the picture containing information can be output by inputting one image to be retrieved, so that the matching efficiency and the matching accuracy are high.

Drawings

FIG. 1 is a flow chart of the method of the present application;

FIG. 2 is a flowchart of tag screening according to the present application;

FIG. 3 is a schematic diagram of a convolutional neural network model of the present application;

fig. 4 is an inter-function picture and a label thereof according to the present application.

Detailed Description

The technical scheme of the present application will be described in detail with reference to the accompanying drawings, but the scope of the present application is not limited to the embodiments.

Example 1:

as shown in fig. 1, the VR panorama scheme matching method based on image intelligent retrieval disclosed by the application automatically extracts picture information through a convolutional neural network, and combines a text and content retrieval mode to quickly match a scheme in a database, and comprises the following steps:

step 1, constructing a VR panorama scheme database, wherein VR panorama scheme links, inter-function labels and inter-function picture feature vectors are stored in the VR panorama scheme database, and the VR panorama scheme links, the inter-function labels and the inter-function picture feature vectors are mutually associated, as shown in FIG. 4, which is a schematic diagram of the inter-function picture and the labels thereof;

The convolutional neural network model suitable for the effect graph is designed and trained, the functional labels can be extracted through the convolutional neural network model, and the feature vectors of the pictures can be extracted, so that when the convolutional neural network model is used for classification, the classification accuracy rate of the images reaches over 96 percent; the panoramic scheme similar to the picture containing information can be output by inputting one image to be retrieved, so that the matching efficiency and the matching accuracy are high.

Further, converting the panoramic scheme into an image matching problem by using a database, and imaging the complex VR panoramic scheme data, wherein in step 1, the specific steps of constructing the VR panoramic scheme database are as follows:

step 1.2, obtaining scheme data from the existing VR panorama scheme, wherein the scheme data comprises VR panorama scheme links, scheme ids, inter-function labels and inter-function picture feature vectors, and when a VR panorama scheme database is established, the inter-function labels and the inter-function picture feature vectors of training samples are extracted by using a convolutional neural network model, and the convolutional neural network model adopted is the same as that in the step 2;

step 2.1, constructing a convolutional neural network which sequentially comprises a CONV1 layer, a Max1 layer, a CONV2 layer, a Max2 layer, a CONV3 layer, a Max3 layer, a flat spreading layer, an FC1 layer, an FC2 layer, an FC3 layer and a soft Max classifier, wherein the convolutional neural network is shown in figure 3; the whole network is trained by adopting a small convolution kernel, the size of the convolution kernel is not more than 5 multiplied by 5, and table 1 details the size of the convolution kernel, the step and the size and parameter conditions of an input/output matrix of each layer of the network.

Table 1 effect diagram classification network structure description

Layer	Input size	Output size	Description of the application	Number of parameters
					CONV1 layer	224×224×3	110×110×32	Convolution kernel: 32 5× 5,s =2	2432
Max1	110×110×32	55×55×32	And (3) core: 2×2, s=1	0
					CONV2	55×55×32	28×28×64	Convolution kernel: 64 3×3 s=2	18496
Max2	28×28×64	14×14×64	And (3) core: 2×2, s=1	0
					CONV3	14×14×64	7×7×128	Convolution kernel: 128 3×3 s=2	73856
Max3	7×7×128	3×3×128	And (3) core: 2×2, s=1	0
					Flatten	3×3×128	1152	Flatten lay-flat	0
FC1	1152	4096	Full connection layer	4722688
					FC2	4096	2048	Full connection layer	8390656
FC3	2048	6	Softmax classification	12294

As shown in FIG. 3, the model fully utilizes the feature extraction functions of the convolution layer and the pooling layer, plays the integration capability of the full-connection layer, autonomously learns the features of the image and is used for a final improved Softmax classifier, the pooling operation is added after convolution calculation of each layer in the network, the training speed of network parameter improvement is reduced, the network input is a 224×224×3 digital matrix, each arrow represents the layer change process, each rectangle in the figure represents the data matrix of the current layer in the network, and the whole network is transmitted into the improved Softmax classifier after ten changes to obtain the final classification result.

Step 2.2, training the constructed convolutional neural network by using sample data, and training the convolutional neural network by using a Batch training mode with a Batch-size of 64; during training, data are transmitted into a network model, the output value of the network is calculated layer by layer, and finally, the gradient descent algorithm is utilized to update parameters so that the network model approaches to an optimal solution; computer resources can be fully utilized by utilizing the Batch mode, a trained network model is used, and for 1000 sample tests, the optimal training accuracy and testing accuracy under different Batch-size conditions are counted, and the results are shown in Table 2.

TABLE 2 different batch-size accuracy

batch-size	16	32	64	128	256
						Optimal training accuracy (%)	99.8	99.7	99.6	99.4	99.7
Test accuracy (%)	90.1	91.0	91.5	90.7	89.9

It was found from Table 2 that although batch-size affects the training speed, the number of iterations and time taken to reach a specified accuracy, it does not cause a significant drop in training accuracy. After a certain training time is reached, the optimal training accuracy of the experiment can reach more than 99 percent. The test accuracy rate of the indoor effect graph for 1000 indoor effect graphs can reach more than 89%. In combination with the experimental statistics, the hardware condition of the experimental computer and the time required for training are considered, and the test accuracy is 91.5% at the maximum when the batch-size is 64, so the batch-size is 64 for training the final network model.

Further, in step 2.3, the convolutional neural network model is optimized by using Dropout, specifically, a Dropout layer is added between the FC1 layer and the FC2 layer and between the FC2 layer and the FC3 layer, the Dropout parameter is set to 0.3, the network model is optimized by adding two Dropout layers, the test accuracy is improved, the batch-size is 64 in the experiment, and the training accuracy and the test accuracy are respectively set under the conditions that the Dropout parameters are 0.3, 0.5 and 0.7 as shown in table 3.

TABLE 3 accuracy of different Dropout parameters

Dropout parameter	0.3	0.5	0.7
				Optimal training accuracy (%)	99.1	98.7	97.7
Test accuracy (%)	96.3	95.7	93.6

The test accuracy is improved by introducing the Dropout mode, and experiments show that the test accuracy is highest when the Dropout parameter is 0.3, and the method is suitable for the convolutional neural network model and the effect graph data.

where α is an adjustable parameter that controls when the negative part of the ELU activation function reaches saturation, where x represents the function input and f (x) represents the activation output; all the full-connection layers still use a ReLu activation function, a batch-size is 64, and a Dropout parameter is 0.3, a training network is adopted in an experiment, the training accuracy is shown in a table 4 under the condition that a network model and super parameters are unchanged, and experiments show that the ELU activation function can improve the classification accuracy of the model herein by 0.6%, so that the testing accuracy of the network herein is improved to 96.9%.

TABLE 4 different activation function accuracy

Furthermore, since the original Softmax adopts an exponential operation form, when the parameters are larger, the exponential growth is very large, the division between the oversized numbers is very easy to cause overflow, the classification will fail or be wrong once overflow occurs, the probability calculation formula of the original Softmax classifier judges the currently input classification, and the probability calculation formula is as follows:

in order to improve the reliability of the convolutional neural network model, in step 2.1, the Softmax classifier adopts an improved Softmax classifier, and the calculation formula of the improved Softmax classifier is as follows:

where V represents an input vector, i represents a position index corresponding to the input vector, length represents a dimension of the input vector, and S represents a corresponding probability result. The improved Softmax classifier is kept consistent with the classification of the original classifier, but the improved Softmax classifier does not have all the super-large values or all the super-small values, and effectively solves the overflow problem of Softmax.

The resulting convolutional neural network model structure is shown in table 5.

Table 5 final effect diagram classification network structure description

Layer	Input size	Output size	Description of the application	Number of parameters
					CONV1	224×224×3	110×110×32	Convolution kernel: 32 5× 5,s =2, elu activated	2432
Max1	110×110×32	55×55×32	And (3) core: 2×2, s=1	0
					CONV2	55×55×32	28×28×64	Convolution kernel: 64 3×3, s=2, eli activation	18496
Max2	28×28×64	14×14×64	And (3) core: 2×2, s=1	0
					CONV3	14×14×64	7×7×128	Convolution kernel: 128 3×3, s=2, eli activated	73856
Max3	7×7×128	3×3×128	And (3) core: 2×2, s=1	0
					Flatten	3×3×128	1152	Flatten lay-flat	0
FC1	1152	4096	Full connectivity layer, relu activation	4722688
					Dropout1	4096	4096	Dropout＝0.3	0
FC2	4096	2048	Full connectivity layer, relu activation	8390656
					Dropout2	2048	2048	Dropout＝0.3	0
FC3	2048	6	Improved Softmax classifier	12294

The convolutional neural network model improves the classification accuracy of the indoor effect graph to 96.9%, and has higher intelligence and accuracy.

Further, in step 2.4, when the optimized convolutional neural network is used to extract the inter-function labels and the inter-function picture feature vectors of the input image to be searched, the improved Softmax classifier outputs the inter-function labels, and the output data of the FC1 layer is intercepted as the inter-function picture feature vectors, i.e. the 4096-dimensional output of the FC1 layer is used as the feature description of the image to be searched.

Further, the feature vector and the label of the image are automatically extracted by using the convolutional neural network, and in the step 3, the specific steps of carrying out label retrieval in the VR panorama scheme database by using the feature vector of the inter-function label and the inter-function picture are as follows:

step 3.1, judging whether the input image to be searched is an effective inter-function image through the inter-function tag, if so, namely, if the acquired inter-function tag is in the range of a kitchen, a living room, a bedroom, a dining room and a toilet, entering step 3.2, if not, prompting to input the image to be searched again, and returning to step 2, if not, like the cat image in fig. 2;

step 3.2, searching in the VR panorama scheme database by utilizing the inter-function label of the image to be searched, searching the corresponding inter-function label of the image to be searched in the VR panorama scheme database, and finding the corresponding inter-function table in the VR panorama scheme database according to the inter-function label obtained by searching, for example, selecting a kitchen table when the inter-function label is a kitchen label, and selecting a living room table when the inter-function label is a living room label;

Step 3.4, sorting the sizes of the similarity calculation results, selecting the first N inter-function picture feature vectors with larger similarity, wherein N can be 50, saving 50 results is to prevent a user from temporarily changing the number of screening results, avoiding repeated calculation, increasing the interactivity of the whole system and saving scheme ids corresponding to the inter-function picture feature vectors; because each VR panorama scheme has a plurality of pictures among functions, the maximum similarity is selected as the similarity between the VR panorama scheme and the input image to be searched, the comprehensiveness of the system can be increased, and the matching accuracy is improved;

Further, in step 3.3, before performing similarity calculation, the feature vector of the image to be searched and the feature vector of the inter-function picture in the inter-function table are subjected to dimension reduction processing by using a PCA algorithm, so that the feature vector of the image to be searched and the feature vector of the inter-function picture in the inter-function table are reduced to 256 dimensions. Because the extracted feature vector contains 4096 numbers, the calculated amount is large when the Euclidean distance is calculated, and the operation speed of the algorithm is seriously influenced, so that the 4096-dimensional feature vector is reduced to 256 dimensions, the idea dimension reduction data of uncorrelated data to remove highly correlated data is reserved, the CPU resource occupation when the system is operated can be effectively reduced, the operation speed of the algorithm is improved, and the user experience is improved.

As described above, although the present application has been shown and described with reference to certain preferred embodiments, it is not to be construed as limiting the application itself. Various changes in form and details may be made therein without departing from the spirit and scope of the application as defined by the appended claims.

Claims

1. The VR panorama scheme matching method based on the intelligent image retrieval is characterized by comprising the following steps:

2. The VR panorama scheme matching method based on image intelligent retrieval according to claim 1, wherein in step 1, the specific steps of constructing a VR panorama scheme database are as follows:

3. The VR panorama scheme matching method based on image intelligent retrieval according to claim 1, wherein in step 2, the specific steps of extracting the inter-function labels and the inter-function picture feature vectors by using the convolutional neural network model are as follows:

step 2.2, training the constructed convolutional neural network by using sample data, and training the convolutional neural network by using a Batch training mode with a Batch-size of 64; during training, data are transmitted into a network model, the output value of the network is calculated layer by layer, and finally, the gradient descent algorithm is utilized to update parameters so that the network model approaches to an optimal solution;

4. The VR panorama scheme matching method based on image intelligent retrieval according to claim 3, wherein in step 2.1, the input of the convolutional neural network is 224×224×3 digital matrix, the CONV1 layer, the CONV2 layer and the CONV3 layer represent the current change as the convolutional layer change, the Max1 layer, the Max2 layer and the Max3 layer represent the Max pooling layer, the FC1 layer, the FC2 layer and the FC3 layer represent the fully connected layer, and the flat layer is an intermediate layer for expanding the convolutional layer into the fully connected layer.

5. The VR panorama matching method based on image intelligent retrieval according to claim 3, wherein in step 2.3, a dropoff layer is added between the FC1 layer and the FC2 layer and between the FC2 layer and the FC3 layer, and the dropoff parameter is set to 0.3.

6. The VR panorama scheme matching method based on image intelligent retrieval according to claim 3, wherein in step 2.3, when the convolutional neural network model is optimized by using the ELU activation function, the ELU activation function is applied to the CONV1 layer, the CONV2 layer and the CONV3 layer, and the function formula of the ELU activation function is as follows:

7. The VR panorama scheme matching method based on image intelligent retrieval of claim 3, wherein in step 2.1, the Softmax classifier is an improved Softmax classifier, and the calculation formula of the improved Softmax classifier is as follows:

8. The VR panorama scheme matching method based on image intelligent retrieval according to claim 7, wherein in step 2.4, when the inter-function label and the inter-function picture feature vector of the input image to be retrieved are extracted by using the optimized convolutional neural network, the inter-function label is output by the improved Softmax classifier, and the output data of the FC1 layer is intercepted as the inter-function picture feature vector.

9. The VR panorama scheme matching method based on image intelligent retrieval according to claim 2, wherein in step 3, the specific steps of performing label retrieval in the VR panorama scheme database by using inter-function labels and inter-function picture feature vectors are as follows:

wherein d is Euclidean distance, s is similarity, h and q are two input vectors, and i is the index of the input vector;

10. The VR panorama scheme matching method based on intelligent image retrieval according to claim 9, wherein in step 3.3, before similarity calculation, feature vectors of the input image to be retrieved and feature vectors of the inter-function pictures in the inter-function table are subjected to dimension reduction processing by using a PCA algorithm, and feature vectors of the image to be retrieved and feature vectors of the inter-function pictures in the inter-function table are reduced to 256 dimensions.