CN108108770A - Moving-vision search framework based on CRBM and Fisher networks - Google Patents

Moving-vision search framework based on CRBM and Fisher networks Download PDF

Info

Publication number
CN108108770A
CN108108770A CN201711493995.XA CN201711493995A CN108108770A CN 108108770 A CN108108770 A CN 108108770A CN 201711493995 A CN201711493995 A CN 201711493995A CN 108108770 A CN108108770 A CN 108108770A
Authority
CN
China
Prior art keywords
network
layer
fisher
algorithm
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711493995.XA
Other languages
Chinese (zh)
Inventor
纪荣嵘
林贤明
黄晨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen University
Original Assignee
Xiamen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen University filed Critical Xiamen University
Priority to CN201711493995.XA priority Critical patent/CN108108770A/en
Publication of CN108108770A publication Critical patent/CN108108770A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24137Distances to cluster centroïds
    • G06F18/2414Smoothing the distance, e.g. radial basis function networks [RBFN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N1/00Scanning, transmission or reproduction of documents or the like, e.g. facsimile transmission; Details thereof
    • H04N1/32Circuits or arrangements for control or supervision between transmitter and receiver or between image input and image output device, e.g. between a still-image camera and its memory or between a still-image camera and a printer device
    • H04N1/32101Display, printing, storage or transmission of additional information, e.g. ID code, date and time or title
    • H04N1/32144Display, printing, storage or transmission of additional information, e.g. ID code, date and time or title embedded in the image data, i.e. enclosed or integrated in the image, e.g. watermark, super-imposed logo or stamp
    • H04N1/32149Methods relating to embedding, encoding, decoding, detection or retrieval operations
    • H04N1/32203Spatial or amplitude domain methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N1/00Scanning, transmission or reproduction of documents or the like, e.g. facsimile transmission; Details thereof
    • H04N1/32Circuits or arrangements for control or supervision between transmitter and receiver or between image input and image output device, e.g. between a still-image camera and its memory or between a still-image camera and a printer device
    • H04N1/32101Display, printing, storage or transmission of additional information, e.g. ID code, date and time or title
    • H04N1/32144Display, printing, storage or transmission of additional information, e.g. ID code, date and time or title embedded in the image data, i.e. enclosed or integrated in the image, e.g. watermark, super-imposed logo or stamp
    • H04N1/32149Methods relating to embedding, encoding, decoding, detection or retrieval operations
    • H04N1/32267Methods relating to embedding, encoding, decoding, detection or retrieval operations combined with processing of the image

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Other Investigation Or Analysis Of Materials By Electrical Means (AREA)

Abstract

Moving-vision search framework based on CRBM and Fisher networks is related to the image retrieval of mobile terminal.Including:1) continuously limited Boltzmann machine network struction and training;2) Fisher layer network structions and training.Using the non-linear sub-space feature information for being reduced to algorithm CRBM and finding the local feature essence of non-gaussian distribution in the global compact binary feature algorithm of polymerization, simultaneously using the network structure polymerization Fisher Vector based on Fisher, more efficient global characteristics are obtained;Compact adaptive feature is obtained using scalar quantization algorithm and bit adaptive algorithm, the image feature information length that can be transmitted according to the adaptive selection of the difference of mobile terminal network bandwidth;Retrieval phase is slightly matched Candidate Set using global characteristics and carries out Geometrical consistency inspection using local feature and accurately matched, so as to adapt to large-scale image retrieval tasks.

Description

Mobile visual search framework based on CRBM and Fisher network
Technical Field
The invention relates to image retrieval of a mobile terminal, in particular to a mobile visual search framework based on a CRBM and a Fisher network.
Background
The International Telecommunication Union (International telecommunications Union) of the united nations has counted the number of users accessing the mobile broadband on a global scale every year. According to the statistics of 2015 and 2016, the amount of users accessing the network through the mobile device globally increases from 32 billion in 2015 to 36 billion in 2016, with a net increase of 4 billion. By 2016, the number of people using mobile devices to surf the internet accounts for approximately 47% of the total population in the world. The ITU has also pointed out that the reason for the rapid growth of subscribers to networks via mobile devices is that mobile broadband services have become popular in 84% of the world, as a result of significant global efforts to build broadband infrastructure for networks. Meanwhile, with the development of mobile network broadband, the 3G network and the 4G network are spread worldwide, and the developing 4.5G and 5G networks bring more residents around the world into the wave of the mobile internet. From industry 3.0 to the present, and future industry 4.0. The internet has drastically changed human lives, and people have taken the internet as a basic tool of life. The support of each country to the internet construction enables the network speed to be continuously improved and the network cost to be continuously reduced. Meanwhile, internet education is deeply inserted into middle and small education, and each student can receive information course education, so that the young generation has the internet genes. By 2016, 12 months, the chinese internet Information Center (CNNIC) issued the thirty-nine statistical reports of the development conditions of the chinese internet. Reports show that the number of netizens in China has increased to 7.31 hundred million people, which is equivalent to the total number of European population. The popularity of the Internet in China has reached 53.2%. The number of mobile broadband net citizens reaches 6.95 hundred million, the growth rate exceeds 10 percent for three years continuously, and mobile equipment continuously occupies the use of desktop computers and notebook computers. Although there are phenomena showing diminishing dividends of the population, the population base using the mobile internet is very large. With the enormous base of mobile broadband users, the market is short of the demand for mobile services and the demand for multimedia information. More and more users are eager to experience future technologies, especially the user experience brought by mobile internet technologies.
In the field of mobile equipment production, more and more excellent IT manufacturers are leaping to produce smart phones and tablet computers and frequently push out new models. Such as Huashi, OPPO, vivo, millet, zhongxing, etc. in China. According to the statistics of HIS Technology of the domestic well-known market research organization, the global mobile phone market share and the total sales in 2016, the domestic mobile phone is reputed to achieve the third and fourth global ranking with OPPO. These manufacturers have released new types of mobile devices that are equipped with multiple sensing devices. Nowadays, cameras, GPS, gravity sensors, electronic compasses, and other devices have become standard equipment for mobile devices, and these equipment are continuously updated. By means of hardware development of mobile equipment and various sensor equipment carried on the mobile equipment, powerful applications running on a mobile terminal can establish quick connection between the real world and the information world. The user can conveniently acquire the multimedia information and the network service required by the user through the mobile network in real time. Needless to say, the picture-based search technology will certainly become one of the core technologies for future mobile internet applications. Usually, a picture contains a large amount of information, and a mode that a user takes an image of an object of interest through a mobile device to acquire related information is more convenient than a text mode, and can acquire more information than a text search. The method has the advantages that simple imagination is realized, if a tourist is interested in a certain building, the tourist can take a scene photo through an application program on a mobile phone to perform real-time search, and search which cannot be achieved by text search can be realized without inquiring the building name; a user acquires visual images, information such as a GPS (global positioning system), an electronic compass and the like from objects in the real world through the mobile intelligent terminal, transmits the related information to a large-scale visual database through the mobile internet to retrieve and acquire the related information, and finally transmits the related information to a user side through a mobile mutual-forgiveness network.
Mobile visual search applications have resulted in an emerging service model in combination with other applications, such as Augmented Reality (AR). For example, a real-time picture is taken of an object or a tool, basic information of the object and the tool is identified through a mobile visual search application, and an augmented reality program can reproduce a three-dimensional geometric model of the object at a mobile terminal and is matched with a dynamic display use method of the geometric model; if the mobile visual search application is combined with the mobile location service, a user can obtain market information, brand information, price information and location information of a certain brand in a market by opening a camera of the mobile terminal. Or, in the emergency accident site, the case handling personnel takes a picture of the accident site through the mobile visual search application, and then obtains the enhanced presentation of the three-dimensional geometric framework of the accident site environment through the mobile augmented reality application to research the solution of the emergency situation.
Mobile visual search faces a number of technical challenges:
firstly, MVS (1, prosperous, huang Xiaobin, foreign mobile visual search statement evaluation J, chinese library report 2014,40 (3): 114-128) searches in a large-scale image database, and can face the problems of search accuracy, long search time and the like for certain service, usually, the database stores massive image data, the relevant information in the massive database is only a small part, and the rest data are interference information. This presents a significant challenge to MVS search accuracy. The user has high requirement on the real-time performance of the mobile terminal search application, and the problem of long search time in a large-scale image database has great influence on the user experience. Aiming at the problem, the core idea of the mobile visual search algorithm is to quickly establish the relation between a query image and a first-relevant image in a database (2 Gu Jia, tang Sheng, xie Hongtao and the like. The mobile visual search reviews [ J ]. The computer aided design and graphics newspaper, 2017,29 (6): 1007-1021). Therefore, the visual features of the image need to have strong distinguishing capability and are compact, so that the matching technology based on the compact features can quickly acquire relevant information in a massive database.
Second, the current mobile devices are not stable in bandwidth and slow in speed. There is a significant delay problem in transmitting the image. Due to the limitation of the wireless network bandwidth, the transmission of a whole query picture can cause a great delay problem, and the user experience is greatly influenced. Therefore, it is natural to transmit the feature information of one picture better than transmitting a whole picture. Different regional network bandwidths are different, and the signal strength is different, so that the scalability of the picture feature information with different network bandwidths is also required, and the method is also very important for how to perform fast and accurate matching on the image information with different lengths.
Third, the hardware computing power and memory power of mobile devices are limited, and the real-time performance of visual feature extraction is a challenge for more complicated visual feature extraction and multitasking, etc., despite the increasing hardware power and memory power of mobile devices. The computing and storage capabilities of the hardware are still limited. Therefore, the visual feature extraction algorithm at the mobile terminal pursues low complexity of time and space, extracts compact visual features, and simultaneously, the differentiation power of the visual features must be ensured, the retrieval accuracy is ensured, and simultaneously, the requirement of real-time property is also considered.
Combining the above challenges, visual features for visual mobile devices need to satisfy the following characteristics: differentiability, compactness and scalability, and simultaneously, the visual algorithm meets the characteristic of low complexity.
In the face of these technical challenges, the most effective mobile visual search framework at present is the Compact Descriptor (CDVS) mobile visual standard (3) formulated by the Moving Picture Experts Group (MPEG) of the IEEE Transactions on Image Processing A Publication of the IEEE Signal Processing Society,2016,25 (1): 179, which aims at formulating an interactive Image retrieval application bitstream syntax standard. However, in the framework, a linear degradation algorithm PCA (principal component analysis) is used for reducing the dimension of the local feature SIFT of the non-Gaussian statistics, so that the local feature information is greatly damaged; a traditional EM algorithm is used for estimating parameters of a Gaussian mixture model containing K Gaussian functions in a global feature Fisher Vector, but the setting of initial parameters has great influence on the convergence degree of the EM algorithm and is easy to converge to local optimum.
Disclosure of Invention
The invention aims to provide a mobile visual search framework based on a CRBM and a Fisher network, aiming at the problems of the conventional mobile visual search framework CDVS.
The invention comprises the following steps:
1) Constructing and training a continuous limited Boltzmann machine network;
2) And constructing and training a Fisher layer network.
In step 1), the specific method for constructing and training the continuous limited boltzmann machine network is as follows:
(1) The construction method comprises the following steps: constructing a 3-layer continuous limited Boltzmann machine network, wherein the first layer comprises 128 units, the second layer comprises 64 units, and the third layer comprises 32 units; the former layer unit is a visual unit, and the latter layer unit is a hidden unit; the visible unit and the hidden unit are connected through full connection, and the connection weight is { w }; a continuous limited Boltzmann machine (CRBM) adds a Gaussian noise continuous random unit with the average value of 0 in a sigmoid function of a visual layer in an RBM network, the structure of the continuous limited Boltzmann machine is the same as that of the RBM, the continuous limited Boltzmann machine comprises a visual layer and a hidden layer, interlayer units are connected with each other, information flows in two directions when the network is trained and used, and the weights of the two directions are the same, namely w ij =w ji (ii) a Let s j For the output of neuron j, the input neuron state is { s } i H hidden layer state j By s j Indicating, visual layer state v i By s i Represents:
wherein
N j (0,1) represents a gaussian random variable with a mean of 0 and a variance of 1; constants σ and N j The product of (0,1) produces a gaussian noise input component n j =σ·N j (0,1) with a probability distribution of:
is a sigmoid function, θ L And theta H Respectively a lower asymptote and an upper asymptote of the sigmoid function; parameter a j Controlling the slope of the sigmoid function; a is a j From small to large, the cell can transition smoothly from a deterministic state without noise to a binary random state; if a is j Let the sigmoid function be linear over the noise range, then s j Will obey an average ofAnd the variance is sigma 2 (ii) a gaussian distribution of;
(2) The training method comprises the following steps:
the CRBM network parameters are trained by adopting a Minimum Contrast Divergence (MCD) weight updating algorithm, and only simple addition and multiplication operations are needed, so that the calculated amount is reduced; updating weight value omega of MCD training criterion ij }, and the slope control parameter of sigmoid function { a } j }:
Wherein the content of the first and second substances,represents the one-time sampled state of neuron j,<·&the operation represents the mean value on the training set; the formula is simplified as:
simplified a j And (3) updating an algorithm:
(3) And (3) supervision and fine adjustment:
the weight obtained after the CRBM uses the contrast divergence algorithm (MCD) has already approached the global optimal solution; but in order to make the network robustness stronger, a back propagation algorithm is adopted for fine tuning (BP algorithm) for fine tuning; desired output target { V' i With input data V i Equal; adjusting each weight gradient by utilizing the error between the output and the input of the calculation model until the error is converged; the parameters to be adjusted are: connection weight between layers, bias weight of each layer, sigmoid slope parameter a j (ii) a The objective function is:
where X is the network input data value, F W,b,a (x) Outputting a value for the network; for each output neuron i of layer L (output layer), the residual is as follows:
l = L-1,.. 2, the residual of the ith neuron node in each layer is:
f(z i ) For the neuron activation function:
for each layer of L = L-1,.. 2, the partial derivatives connecting the weight parameter, the bias parameter, and the slope control parameter are:
Wl J(W,b,a)=δ (l+1) (s l ) T a l
bl J(W,b,a)=δ (l+1) a l
al J(W,b,a)=δ (l+1) (h l+1 ) T
wherein:
h l =W l-1 ×s l-1 +b l-1 +σN(0,1)
z l =a l-1 ×h l
s l =f(z l )
the gradient obtained is the gradient update of a single sample in the data set, and the training of the whole data set is only required to add each gradient to obtain an average gradient; after the gradient values of all parameters are found, each parameter is optimized using a quasi-newton optimization algorithm.
In the step 2), the specific method for constructing and training the Fisher layer network is as follows:
the gaussian mixture model is simplified for two points, assuming:
(1) Assuming equal weight per Gaussian function in GMM, i.e. ω k =1;
(2) Simplification u k (x) Is of the form:
equivalently assuming that the covariance matrices have the same determinant values; simplified gamma j (k):
Suppose w k =l/σ k ,b k =-μ k The final fisher layer is of the form:
wherein, an element operation; gamma ray j (k) Is a softmax function, w k ,b k Is a parameter of the k-th Gaussian function of the GMM; gamma ray j (k) Contains a common calculation part w n ⊙(x ij +b n ) Is differential, other calculations are linear or square operations, are derivable; learning parameters through a directional propagation algorithm;
the simplified Fisher Vector algorithm is linear operation, so that the gradient of an error function to all weights and offset values can be calculated by a gradient descent method and an error back propagation mode in network training; in the large-scale image retrieval problem, a self-adaptive global binary feature acts on the first stage of a retrieval process, hamming distance matching is carried out in a database of a server end by using the global feature to obtain a candidate set, and a cross quotient loss function is selected:
wherein s is i =[s i1 ,...,s iC ] T Representation image X i A score vector of (a); y is i =[y i1 ,...,y iC ] T Representing a tag vector; c is the number of classes in the dataset; σ (x) is the sigmoid function, i.e.:
σ(x)=1/(1+exp(-x))。
the invention has the following advantages:
the invention uses a nonlinear dimension reduction algorithm CRBM to reduce the dimension of the local features of one image, and improves the effect of the aggregated global features by reducing the loss of the dimension reduction algorithm to the local feature information; and meanwhile, a Fisher network based on learning is adopted to generate more efficient Fisher Vector aggregation characteristics.
The invention provides a lightweight high-efficiency image retrieval system capable of being deployed at a mobile terminal, wherein in an aggregate global compact binary feature algorithm, a nonlinear descent algorithm CRBM is adopted to search subspace feature information of local feature essence of non-Gaussian distribution, and meanwhile, a Fisher-based network structure aggregate Fisher Vector is adopted to obtain more efficient global features; a scalar quantization algorithm and a bit self-adaptive algorithm are adopted to obtain compact self-adaptive characteristics, and the length of transmitted image characteristic information can be selected according to different self-adaptations of the network bandwidth of the mobile terminal; in the retrieval stage, a candidate set is obtained by using global characteristics to be roughly matched, and the accurate matching is carried out by using local characteristics to carry out geometric consistency check, so that the method is well suitable for a large-scale image retrieval task.
Drawings
FIG. 1 is a diagram showing the structure of a Fisher network according to the present invention.
Fig. 2 is a global binary compact feature aggregation flow diagram.
Detailed Description
The trained CRBM network and Fisher network are utilized to aggregate global compact binary features by using the following algorithm, specifically as follows:
1) Inputting:
a) Off-line training GMM model
b) Image X local feature SIFT set { X j ,j=1,...,t},
2) Image X local feature SIFT set { X j ,j=1...t},Each local feature x in j
3) Using a continuous limited boltzmann machine to convert x j Reducing the dimension from 128 dimensions to 32 dimensions;
4) Exiting the loop;
5) For each gaussian function i;
6) For each local feature SIFT descriptor j;
7) Computing local feature x j Posterior probability gamma corresponding to ith Gaussian function j (i);
8) Exiting the loop;
9) Aggregating the Gaussian mean gradient vectors g of all local feature likelihoods for each Gaussian function i ui Sum variance gradient vectorAre all 32-dimensional, i = 1.., 512);
10 ) exit the loop;
11 A pair ofVector uses SCFV scalar quantization method to obtain binary global features
12 A pair ofUsing bitsThe adaptive algorithm obtains compact descriptors g corresponding to 512bytes,1KB,2KB,4KB,8KB,16KB bitstreams.
And (3) outputting: fisher0/1 binarizes scalable compact descriptors.
The retrieval process is as follows:
after the local features extracted from the query image and the binary scalable compact global features, the retrieval stage is divided into two steps. The first step is as follows: matching a candidate set for the binary scalable compact global features of the image in a server database by using a Hamming distance; the second step: the candidate images are matched exactly using a geometric consistency check on the local features in the candidate set. And returns the search result.
The Fisher network structure diagram of the invention is shown in fig. 1, the calculation functions of each module are as follows:
(1) module calculation of y ijk_1 =w k ⊙(X ij +b k ) Feature vector x ij Corresponding to 512 Gaussian function output y ij1_1 ,y ij2_1 ,...,y ij512_1 };
(2) Module calculation of y ijk_2 =(y ijk_1 ) 2 ,{y ij1_1 ,y ij2_1 ,...,y ij512_1 The square of each element in the { outputs { y } ij1_2 ,y ij2_2 ,...,y ij512_2 };
(3) Modular calculation formulaOutput { y ij1_3 ,y ij2_3 ,...,y ij512_3 }。
(4) Modular calculation formulaOutput { y ij1_4 ,y ij2_4 ,...,y ij512_4 }。
(5) Modular calculation formulaOutput { gamma } j (1),γ j (2),...,γ j (k)}。
(6) Module calculation of y ijk_5 =y ijk_2 -1, output { y ij1_5 ,y ij2_5 ,...,y ij512_5 }。
(7) Modular computingOutput { y ij1_6 ,y ij2_6 ,...,y ij512_6 }。
(8) Modular calculation formulaOutput of
(9) Modular calculation formulaOutput of

Claims (3)

1. The mobile visual search framework based on the CRBM and the Fisher network is characterized by comprising the following steps:
1) Constructing and training a continuous limited Boltzmann machine network;
2) And constructing and training a Fisher layer network.
2. The CRBM and Fisher network-based mobile visual search framework of claim 1, wherein in step 1), the specific method for constructing and training the continuous restricted boltzmann machine network is as follows:
(1) The construction method comprises the following steps: constructing a 3-layer continuous limited Boltzmann machine network, wherein the first layer comprises 128 units, the second layer comprises 64 units, and the third layer comprises 32 units; the former layer unit is a visual unit, and the latter layer unit is a hidden unit; the visible unit and the hidden unit are connected through full connection, and the connection weight is { w }; continuous limited glassThe structure of the continuous random unit is the same as that of the RBM, the continuous random unit comprises a visible layer and a hidden layer, interlayer units are connected with each other, information can flow in two directions when the network is trained and used, and the weights in the two directions are the same, namely w is w ij =w ji (ii) a Let s j For the output of neuron j, the input neuron state is { s } i H hidden layer state j By s j Indicating, visual layer state v i By s i Represents:
wherein
N j (0,1) represents a gaussian random variable with a mean of 0 and a variance of 1; constants σ and N j The product of (0,1) produces a gaussian noise input component n j =σ·N j (0,1) having a probability distribution of:
is a sigmoid function, θ L And theta H Respectively a lower asymptote and an upper asymptote of the sigmoid function; parameter a j Controlling the slope of the sigmoid function; a is j From small to large, the cell can transition smoothly from a deterministic state without noise to a binary random state; if a is j Let the sigmoid function be linear over the noise range, then s j Will obey an average ofAnd the variance is sigma 2 (ii) a gaussian distribution of;
(2) The training method comprises the following steps:
the CRBM network parameters are trained by adopting a minimum contrast divergence weight updating algorithm, and only simple addition and multiplication operations are needed; MCD training criterion updating weight value omega ij As well as the slope control parameter of sigmoid function { a } j }:
Wherein, the first and the second end of the pipe are connected with each other,represents the one-time sampled state of neuron j,<·&the operation represents the mean value on the training set; the formula is simplified as:
simplified a j And (3) updating an algorithm:
(3) And (3) supervision and fine adjustment:
the weight value obtained after the CRBM uses a contrast divergence algorithm approaches to a global optimal solution, and a back propagation algorithm is adopted for fine adjustment; desired output target { V' i And input data { V } i Equal; adjusting each weight gradient by utilizing the error between the output and the input of the calculation model until the error is converged; the parameters to be adjusted are: between layersConnection weight, bias weight of each layer and sigmoid slope parameter a j (ii) a The objective function is:
where x is the network input data value, F W,b,a (x) Outputting a value for the network; for each output neuron i of the L-th, output layer, the residual is as follows:
l = L-1,.. 2, the residual of the ith neuron node in each layer is:
f(z i ) For the neuron activation function:
for each layer of L = L-1,.. 2, the partial derivatives connecting the weight parameter, the bias parameter, and the slope control parameter are:
wherein:
h l =W l-1 ×s l-1 +b l-1 +σN(0,1)
z l =a l-1 ×h l
s l =f(z l )
the gradient obtained is the gradient update of a single sample in the data set, and the training of the whole data set is only required to add each gradient to obtain an average gradient; after the gradient values of all parameters are found, each parameter is optimized using a quasi-newton optimization algorithm.
3. The CRBM and Fisher network based mobile visual search framework of claim 1, wherein in step 2), the Fisher layer network is constructed and trained by the following specific method:
the gaussian mixture model is simplified for two points, assuming:
(1) Assuming equal weight per Gaussian function in GMM, i.e. ω k =1;
(2) Simplification u k (x) Is of the form:
equivalently assuming that the covariance matrices have the same determinant values; simplified gamma j (k):
Suppose w k =1/σ k ,b k =-μ k The final fisher layer is in the form:
wherein, an element operation; gamma ray j (k) Is a softmax function, w k ,b k Is a parameter of the k-th Gaussian function of the GMM; gamma ray j (k) Contains a common calculation part w n ⊙(x ij +b n ) Is differential, other calculations are linear or square operations, are derivable; learning parameters through a directional propagation algorithm;
the simplified Fisher Vector algorithm is linear operation, so that the gradient of an error function to all weights and offset values can be calculated by a gradient descent method and an error back propagation mode in network training; the CDVS mainly solves the problems of large-scale image retrieval and picture matching, in the problem of large-scale image retrieval, the self-adaptive global binary feature acts on the first stage of a retrieval process, hamming distance matching is carried out in a database of a server end by using the global feature to obtain a candidate set, and a cross quotient loss function is selected:
wherein s is i =[s i1 ,...,s iC ] T Representation image X i A score vector of (a); y is i =[y i1 ,...,y iC ] T Representing a tag vector; c is the number of classes in the dataset; σ (x) is the sigmoid function, i.e.:
σ(x)=1/(1+exp(-x))。
CN201711493995.XA 2017-12-31 2017-12-31 Moving-vision search framework based on CRBM and Fisher networks Pending CN108108770A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711493995.XA CN108108770A (en) 2017-12-31 2017-12-31 Moving-vision search framework based on CRBM and Fisher networks

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711493995.XA CN108108770A (en) 2017-12-31 2017-12-31 Moving-vision search framework based on CRBM and Fisher networks

Publications (1)

Publication Number Publication Date
CN108108770A true CN108108770A (en) 2018-06-01

Family

ID=62215223

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711493995.XA Pending CN108108770A (en) 2017-12-31 2017-12-31 Moving-vision search framework based on CRBM and Fisher networks

Country Status (1)

Country Link
CN (1) CN108108770A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108920727A (en) * 2018-08-03 2018-11-30 厦门大学 Compact visual in vision retrieval describes sub- deep neural network and generates model
CN109657800A (en) * 2018-11-30 2019-04-19 清华大学深圳研究生院 Intensified learning model optimization method and device based on parametric noise
CN112926273A (en) * 2021-04-13 2021-06-08 中国人民解放***箭军工程大学 Method for predicting residual life of multivariate degradation equipment
CN113780301A (en) * 2021-07-26 2021-12-10 天津大学 Self-adaptive denoising machine learning application method for defending against attack

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101383023A (en) * 2008-10-22 2009-03-11 西安交通大学 Neural network short-term electric load prediction based on sample dynamic organization and temperature compensation
CN102521607A (en) * 2011-11-30 2012-06-27 西安交通大学 Near-optimal skin-color detection method under Gaussian frame
CN102708383A (en) * 2012-05-21 2012-10-03 广州像素数据技术开发有限公司 System and method for detecting living face with multi-mode contrast function
US20140198998A1 (en) * 2013-01-14 2014-07-17 Samsung Electronics Co., Ltd. Novel criteria for gaussian mixture model cluster selection in scalable compressed fisher vector (scfv) global descriptor
CN106405640A (en) * 2016-08-26 2017-02-15 中国矿业大学(北京) Automatic microseismic signal arrival time picking method based on depth belief neural network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101383023A (en) * 2008-10-22 2009-03-11 西安交通大学 Neural network short-term electric load prediction based on sample dynamic organization and temperature compensation
CN102521607A (en) * 2011-11-30 2012-06-27 西安交通大学 Near-optimal skin-color detection method under Gaussian frame
CN102708383A (en) * 2012-05-21 2012-10-03 广州像素数据技术开发有限公司 System and method for detecting living face with multi-mode contrast function
US20140198998A1 (en) * 2013-01-14 2014-07-17 Samsung Electronics Co., Ltd. Novel criteria for gaussian mixture model cluster selection in scalable compressed fisher vector (scfv) global descriptor
CN106405640A (en) * 2016-08-26 2017-02-15 中国矿业大学(北京) Automatic microseismic signal arrival time picking method based on depth belief neural network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
CHEN HUANG等: "DEEP-BASED FISHER VECTOR FOR MOBILE VISUAL SEARCH", 《2017 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108920727A (en) * 2018-08-03 2018-11-30 厦门大学 Compact visual in vision retrieval describes sub- deep neural network and generates model
CN109657800A (en) * 2018-11-30 2019-04-19 清华大学深圳研究生院 Intensified learning model optimization method and device based on parametric noise
CN112926273A (en) * 2021-04-13 2021-06-08 中国人民解放***箭军工程大学 Method for predicting residual life of multivariate degradation equipment
CN113780301A (en) * 2021-07-26 2021-12-10 天津大学 Self-adaptive denoising machine learning application method for defending against attack
CN113780301B (en) * 2021-07-26 2023-06-27 天津大学 Self-adaptive denoising machine learning application method for defending against attack

Similar Documents

Publication Publication Date Title
US20210201147A1 (en) Model training method, machine translation method, computer device, and storage medium
US12008810B2 (en) Video sequence selection method, computer device, and storage medium
US20210342643A1 (en) Method, apparatus, and electronic device for training place recognition model
CN108108770A (en) Moving-vision search framework based on CRBM and Fisher networks
WO2022199504A1 (en) Content identification method and apparatus, computer device and storage medium
CN116797684B (en) Image generation method, device, electronic equipment and storage medium
KR102604306B1 (en) Image table extraction method, apparatus, electronic device and storage medium
US20230401833A1 (en) Method, computer device, and storage medium, for feature fusion model training and sample retrieval
CN113962965B (en) Image quality evaluation method, device, equipment and storage medium
CN113254684B (en) Content aging determination method, related device, equipment and storage medium
CN113569129A (en) Click rate prediction model processing method, content recommendation method, device and equipment
CN110555102A (en) media title recognition method, device and storage medium
KR20220018633A (en) Image retrieval method and device
CN117635275B (en) Intelligent electronic commerce operation commodity management platform and method based on big data
CN110855487A (en) Network user similarity management method, device and storage medium
CN111988668B (en) Video recommendation method and device, computer equipment and storage medium
CN111695323B (en) Information processing method and device and electronic equipment
CN113420179A (en) Semantic reconstruction video description method based on time sequence Gaussian mixture hole convolution
CN111898658B (en) Image classification method and device and electronic equipment
CN110688508B (en) Image-text data expansion method and device and electronic equipment
CN114298961A (en) Image processing method, device, equipment and storage medium
CN113822291A (en) Image processing method, device, equipment and storage medium
CN115269901A (en) Method, device and equipment for generating extended image
CN114329236A (en) Data processing method and device
CN112287697A (en) Method for accelerating running speed of translation software in small intelligent mobile equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180601