CN109165679A - Data processing method and device - Google Patents

Data processing method and device Download PDF

Info

Publication number
CN109165679A
CN109165679A CN201810859161.4A CN201810859161A CN109165679A CN 109165679 A CN109165679 A CN 109165679A CN 201810859161 A CN201810859161 A CN 201810859161A CN 109165679 A CN109165679 A CN 109165679A
Authority
CN
China
Prior art keywords
source domain
sample
alignment
domain sample
matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810859161.4A
Other languages
Chinese (zh)
Other versions
CN109165679B (en
Inventor
许明微
李琳
吴耀华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Migu Cultural Technology Co Ltd
China Mobile Communications Group Co Ltd
Original Assignee
Migu Cultural Technology Co Ltd
China Mobile Communications Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Migu Cultural Technology Co Ltd, China Mobile Communications Group Co Ltd filed Critical Migu Cultural Technology Co Ltd
Priority to CN201810859161.4A priority Critical patent/CN109165679B/en
Publication of CN109165679A publication Critical patent/CN109165679A/en
Application granted granted Critical
Publication of CN109165679B publication Critical patent/CN109165679B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24143Distances to neighbourhood prototypes, e.g. restricted Coulomb energy networks [RCEN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a data processing method, which comprises the following steps: determining an optimal alignment matrix according to the aggregation degree of the source domain samples in the aligned space, so that the aggregation degree of first source domain samples belonging to the same category in the source domain samples in the aligned space is maximum; aligning the subspace of the source domain sample and the subspace of the target domain sample by using the optimal alignment matrix to obtain a source domain data set and a target domain data set; training a nearest neighbor classifier according to the source domain data set and the target domain data set to obtain a trained classifier; and carrying out classification and identification on the samples without the labels in the target domain samples by using the trained classifier. The invention also discloses a data processing device.

Description

A kind of data processing method and device
Technical field
The present invention relates to data processing techniques, and in particular to a kind of data processing method and device.
Background technique
In the prior art usually by training data from field be referred to as source domain, by test data from field be referred to as For aiming field.It is keeping imitating with kernel space alignment (NPKSA) method in raising image recognition using neighborhood in the prior art It is that source domain and target area image are mapped in the same higher dimensional space using nuclear mapping function, so that in this height when rate Source domain and aiming field linearity can divide in dimension space.Then, in higher dimensional space source domain image and target area image use Principal Component Analysis (PCA, Principal Component Analysis) dimensionality reduction, obtains source domain subspace and aiming field is empty Between.Then, learn an alignment matrix to be aligned source domain subspace and aiming field subspace, guarantee to belong to not in luv space It is separated as far as possible in the space of generic source domain sample after alignment.Finally, the alignment matrix obtained using study is to new Image classify.
However, due to the prior art utilize be source domain sample different classes of in luv space information, allow original sky Between in disperse in the space of different classes of sample after alignment as far as possible, in this way, may cause generic in luv space Sample space after alignment in also very dispersion (as shown in Figure 1) influences classifier and exists to be unfavorable for the training of classifier Accuracy when Classification and Identification is carried out to image.
Fig. 1 is the training effect schematic diagram for keeping being aligned (NPKSA) method with kernel space in the prior art based on neighborhood;
As shown in Figure 1, belonging to generic source domain sample in source domain subspace 101, the alignment in NPKSA method is being used After matrix carries out spatial alignment, also disperse very much in space 102 after alignment, so as to cause accurate when carrying out image classification It spends lower.
Summary of the invention
To solve existing technical problem, an embodiment of the present invention is intended to provide a kind of data processing method, Neng Gouti The accuracy of hi-vision identification.
The technical solution of the embodiment of the present invention is achieved in that
One side according to an embodiment of the present invention provides a kind of data processing method, which comprises
Optimal alignment matrix is determined according to the aggregation extent in the space of source domain sample after alignment, so that the source domain sample Belong to the first source domain sample of the same category in this, the aggregation extent in space after alignment is maximum;
The subspace of the source domain sample and the subspace of aiming field sample are aligned using the optimal alignment matrix, obtained To source domain data set and aiming field data set;
Nearest neighbor classifier is trained according to the source domain data set and the aiming field data set, after being trained Classifier;
Using the classifier after training to the sample of tape label does not carry out Classification and Identification in the aiming field sample.
In above scheme, the method also includes:
Optimal alignment matrix is determined according to the aggregation extent in the space of source domain sample after alignment, so that the source domain sample The the second source domain sample to belong to a different category in this, the aggregation extent in space after alignment are minimum.
In above scheme, optimal alignment matrix is determined according to the aggregation extent in the space of source domain sample after alignment, is made The the first source domain sample for belonging to the same category in the source domain sample is obtained, the aggregation extent in space after alignment is maximum, packet It includes:
Optimal alignment matrix is calculated, so that institute in the mean value of the i-th class sample and the source domain sample in the source domain sample There is the difference between the mean value of sample minimum.
In above scheme, optimal alignment matrix is determined according to the aggregation extent in the space of source domain sample after alignment, is made The the second source domain sample to belong to a different category in the source domain sample is obtained, the aggregation extent in space after alignment is minimum, packet It includes:
Optimal alignment matrix is calculated, so that institute in the mean value of the i-th class sample and the source domain sample in the source domain sample There is the difference between the mean value of sample maximum.
In above scheme, optimal alignment matrix is determined according to the aggregation extent in the space of source domain sample after alignment, is wrapped It includes:
First belonged in the source domain sample in the space of the first source domain sample of the same category after alignment is constructed to dissipate Cloth matrix, first scatter matrix characterize the maximum aggregation extent in the space of the first source domain sample after alignment;
Second constructed in the space of the second source domain sample to belong to a different category in the source domain sample after alignment dissipates Cloth matrix, second scatter matrix characterize the minimized aggregation degree in the space of the second source domain sample after alignment;
According to first scatter matrix and second scatter matrix, the optimal alignment matrix is determined.
In above scheme, the first scatter matrix SwExpression formula utilize following equation (1) building;Described second spreads Matrix SbExpression formula utilize following equation (2) building;
It enables:
Wherein, C indicates source domain sample S and aiming field sample T classification number affiliated in luv space;niIndicate i-th The number of class sample;μiIndicate the mean value of the i-th class sample;μ indicates the mean value of all samples;PsIndicate source domain subspace;M is indicated Source domain subspace PsWith aiming field subspace PtAlignment matrix;Indicate some specific sample;The transposition of subscript T representing matrix.
In above scheme, the expression formula of the optimal alignment matrix is constructed by following formula (5):
Wherein, λ, β ∈ (0 ,+∞] indicate iotazation constant,Indicate subspace and the aiming field of source domain sample The distributional difference of the subspace of sample, SwIt indicates to belong to the sky of the first source domain sample of the same category after alignment in source domain sample Between in aggregation extent;SbIndicate point in the space of the second source domain sample to belong to a different category in source domain sample after alignment The degree of dissipating;M indicates optimal alignment matrix;λtr(Sw) it is matrix SwMark, β tr (Sb) it is matrix SbMark.
In above scheme, according to the aggregation extent in source domain sample space after alignment determine optimal alignment matrix it Before, the method also includes:
Using Principal Component Analysis PCA respectively in luv space the source domain sample and the aiming field sample carry out Dimension-reduction treatment obtains the subspace of the source domain sample and the subspace of the aiming field sample.
According to another aspect of an embodiment of the present invention, a kind of data processing equipment is provided, described device includes: that matrix determines Unit, spatial alignment unit, training unit and recognition unit;
Wherein, the matrix determination unit, for being determined according to the aggregation extent in source domain sample space after alignment Optimal alignment matrix, so that belonging to the first source domain sample of the same category in the source domain sample, in space after alignment Aggregation extent is maximum;
The spatial alignment unit, for utilizing the optimal alignment matrix by the subspace of the source domain sample and target The subspace of domain sample is aligned, and obtains source domain data set and aiming field data set;
The training unit, for according to the source domain data set and the aiming field data set to nearest neighbor classifier into Row training, the classifier after being trained;
The recognition unit, for using training after classifier in the aiming field sample or not of tape label into Row Classification and Identification.
According to a third aspect of the embodiments of the present invention, a kind of data processing equipment is provided, described device include: memory and Processor;
Wherein, the memory, for storing the computer program that can be run on the processor;
The processor when for running the computer program, executes described in any one of above-mentioned data processing method The step of method.
The embodiment of the present invention provides a kind of data processing method and device, according in the space of source domain sample after alignment Aggregation extent determines optimal alignment matrix, so that belonging to the first source domain sample of the same category in the source domain sample, is being aligned Aggregation extent in space afterwards is maximum;Using the optimal alignment matrix by the subspace of the source domain sample and aiming field sample This subspace alignment, obtains source domain data set and aiming field data set;According to the source domain data set and the aiming field number Nearest neighbor classifier is trained according to collection, the classifier after being trained;Using the classifier after training to the aiming field The sample of tape label does not carry out Classification and Identification in sample.In this way, not only increase the robustness of classifier, and also improve pair The identification accuracy of image.
Detailed description of the invention
Fig. 1 is the training effect schematic diagram for keeping being aligned (NPKSA) method with kernel space in the prior art based on neighborhood;
Fig. 2 is the flow diagram of data processing method in the embodiment of the present invention;
Fig. 3 is the training effect schematic diagram of data processing method based on the embodiment of the present invention;
Fig. 4 is the structural schematic diagram one of data processing equipment in the embodiment of the present invention;
Fig. 5 is the structural schematic diagram two of data processing equipment in the embodiment of the present invention.
Specific embodiment
Detailed description of the preferred embodiments with reference to the accompanying drawing.It should be understood that this place is retouched The specific embodiment stated is merely to illustrate and explain the present invention, and is not intended to restrict the invention.
Fig. 2 is the flow diagram of data processing method in the embodiment of the present invention;The method specifically can be a kind of use In the subspace alignment schemes (NPSA) of image recognition kept based on neighborhood.As shown in Figure 2, which comprises
Step 201, optimal alignment matrix is determined according to the aggregation extent in source domain sample space after alignment, so that institute The the first source domain sample for belonging to the same category in source domain sample is stated, the aggregation extent in space after alignment is maximum;
Specifically, can by calculate optimal alignment matrix, come so that in the source domain sample mean value of the i-th class sample and Difference in the source domain sample between the mean value of all samples is minimum, so, it is possible to realize in the source domain sample and belongs to phase Aggregation extent in the space of the first generic source domain sample after alignment is maximum.
It, can also be according to the optimal alignment matrix, so that belonging to difference in the source domain sample in the embodiment of the present invention Aggregation extent in the space of second source domain sample of classification after alignment is minimum.
It specifically, can be by calculating optimal alignment matrix, so that the mean value of the i-th class sample and institute in the source domain sample The difference stated in source domain sample between the mean value of all samples is maximum.To realize second to belong to a different category in source domain sample Aggregation extent in the space of source domain sample after alignment is minimum.
In the embodiment of the present invention, optimal alignment square is being determined according to the aggregation extent in source domain sample space after alignment It, specifically can be by constructing the first scatter matrix S in the space of the first source domain sample after alignment when battle arrayw, and building The second scatter matrix S in the space of the second source domain sample after alignmentb, according to the first scatter matrix SwWith it is described Second scatter matrix SbTo determine the optimal alignment matrix.
Wherein, the first scatter matrix SwCharacterize the maximum aggregation in the space of the first source domain sample after alignment Degree;The second scatter matrix SbCharacterize the minimized aggregation degree in the space of the second source domain sample after alignment.
In the embodiment of the present invention, the first scatter matrix SwExpression formula can specifically be constructed by following equation (1); State the second scatter matrix SbExpression formula can specifically be constructed by following equation (2);
Wherein, it enables:
In above-mentioned formula (1), (2) and (3), C indicate source domain sample S and aiming field sample T in luv space belonging to Classification number;niIndicate the number of the i-th class sample;μiIndicate the mean value of the i-th class sample;μ indicates the mean value of all samples;Ps Table Show source domain subspace;M indicates source domain subspacePs With aiming field subspacePt Alignment matrix;Indicate k-th in the i-th class Sample;I indicates the i-th class sample, the i.e. classification of sample;K indicates k-th of sample in a certain classification;Subscript T representing matrix turns It sets.
Pass through S in the embodiment of the present inventionwAnd Sb, make each sample in source domain and classification sample standard deviation where oneself respectively Value μiBetween difference it is minimum, and make the difference between the mean value and all sample averages of every class sample maximum, realize Belong in the space of generic source domain sample after alignment in luv space and assembles as far as possible, it is different classes of and non-conterminous Source domain sample disperses as far as possible in space after alignment, finally improves the accuracy of trained model identification image.
In the embodiment of the present invention, when determining optimal alignment matrix, tape label in given luv space can also be enabled Source domain sample isWhereinIndicate that i-th of sample in source domain space, Rn × 1 indicate the n of source domain sample Dimension space,Indicate that the label of i-th of sample in source domain space, C indicate classification belonging to source domain sample Number, nsIndicate the number of sample in source domain.
In the embodiment of the present invention, when determining optimal alignment matrix, not tape label can also be enabled in given luv space Aiming field sample beWhereinIndicate j-th of sample in aiming field, ntIndicate sample in aiming field Number.Total class number of sample is identical with total class number of sample in source domain in aiming field, is also C, and in aiming field Sample is all without label.
In the embodiment of the present invention, the expression formula of the optimal alignment matrix can specifically be constructed by following formula (5), be led to The the first source domain sample for belonging to the same category in the source domain sample, space after alignment can be made by crossing following formula (5) In aggregation extent it is maximum;So that the second source domain sample to belong to a different category in the source domain sample, space after alignment In aggregation extent it is minimum.
Wherein, λ, β ∈ (0 ,+∞] indicate iotazation constant,Indicate subspace and the aiming field of source domain sample The distributional difference of the subspace of sample, SwIt indicates to belong to the sky of the first source domain sample of the same category after alignment in source domain sample Between in aggregation extent;SbIndicate point in the space of the second source domain sample to belong to a different category in source domain sample after alignment The degree of dissipating;M indicates optimal alignment matrix;λtr(Sw) it is matrix SwMark, β tr (Sb) it is matrix SbMark.
The subspace alignment schemes that the embodiment of the present invention is kept based on field, by adjusting optimal alignment matrix M and canonical Change constant λ, and β ∈ (0 ,+∞], it can use optimal alignment matrix M, to make into the source domain sample to belong to the of the same category One source domain sample is assembled as far as possible in space after alignment;So that belong to a different category in the source domain sample second Source domain sample disperses in space after alignment, as far as possible to improve the training effectiveness of classifier.
In the embodiment of the present invention, after obtaining the expression formula (formula 5) of optimal alignment matrix, it is also necessary to by formula (5) It is converted into Lagrangian f (M), then the expression formula of optimal alignment matrix M is solved;Specific solution procedure are as follows:
Firstly, using the relationship between Frobenius norm and trace of a matrix, by the first item in formula (5) be converted to as Lower expression formula:
Then, the Section 2 in formula (5) is unfolded are as follows:
Wherein,Indicate the set of the i-th class source domain sample in luv space.
Finally by the Section 2 abbreviation in formula (5) are as follows:
Wherein,Indicate the set of all source domain samples in luv space, L is that a block is diagonal Matrix, diagonal element
Third is unfolded by the Section 3 in formula (5) are as follows:
Wherein, it is learnt by formula (7):Therefore:
Wherein D is a ns×nsDiagonal matrix, diagonal element
WhereinIt is a ns×nsDiagonal matrix, each element in matrix are
Section 3 in final formula (5) can be with abbreviation are as follows:
Wherein G=D-W is a ns×nsLaplacian Matrix.
Therefore, final mask (5) can be stated as:
Then, it enables:
It enables again,It obtains:
Finally obtain the expression formula of optimal alignment matrix M are as follows:
Step 202, using the optimal alignment matrix that the son of the subspace of the source domain sample and aiming field sample is empty Between be aligned, obtain source domain data set and aiming field data set;
Specifically, the source domain sample in luv space is projected to the sky after alignment first with the optimal alignment matrix Between in so that the label registration of the label of source domain sample and aiming field sample, in space after alignment, obtain source domain sample Source domain data set;Secondly, the aiming field sample body in target domain space is projected directly into aiming field subspace, aiming field is obtained The aiming field data set of sample.
In the embodiment of the present invention, it is specific using PCA directly in luv space source domain image and the progress of target area image Dimensionality reduction obtains the subspace of source domain sample and the subspace of aiming field sample.Here, PCA refers to one group through orthogonal transformation There may be the variables of correlation to be converted to one group of linearly incoherent variable, this group of variable after conversion is principal component.
Due to determining that the method for PCA principal component dimension is very mature at present, do not need to examine in the embodiment of the present invention Consider the regulation problem of the two parameters of subspace dimension and kernel function type.
Step 203, nearest neighbor classifier is trained according to the source domain data set and the aiming field data set, is obtained Classifier after to training;
It, specifically can be by the source domain sample, the aiming field when being trained to classifier in the embodiment of the present invention The label and iotazation constant of sample, the source domain sample, as input data;Then, in the training process to classifier, Using the source domain sample of input, the label of the aiming field sample and the source domain sample, the iotazation constant is adjusted, Until making the label of the aiming field sample of output and the label registration of the source domain sample.Then, determine it is optimal After alignment matrix, that is, the classifier for defining image for identification is indicated, so as to pass through the classifier after training to mesh Image in mark field carries out classification identification.
Step 204, using the classifier after training to the sample of tape label does not carry out classification knowledge in the aiming field sample Not.
In the embodiment of the present invention, after finding out optimal alignment matrix, so that it may utilize the optimal alignment matrix M and source domain Subspace PsDimensionality reduction is carried out to the source domain sample in luv space, to obtain the source domain sample set after dimensionality reductionIt then, then will Source domain sample set after dimensionality reductionIn space after projecting to alignment, source domain data set Y is obtaineds,Ys=(PsM)TXs;Then Utilize aiming field subspace PtAiming field sample set in higher dimensional space is projected directly into aiming field subspace and obtains aiming field Data set Yt, wherein Yt=Pt TXt;Then, then by the space after alignment source domain sample and aiming field sample be sent into training after Classifier in, and according to source domain data set YsWith aiming field data set Yt, to the sample of tape label is not classified in aiming field Identification, and export recognition result.
Data processing method provided in an embodiment of the present invention, first with PCA to the source domain image and target in luv space Area image carries out dimensionality reduction, obtains source domain subspace and aiming field subspace.Then, learn an optimal alignment matrix for source domain Space and the alignment of aiming field subspace, so that belonging to the space of the first source domain sample of the same category after alignment in luv space In flock together as far as possible so that in the space of the second source domain sample to belong to a different category in luv space after alignment Disperse as far as possible.Finally, being trained using the optimal alignment matrix that study obtains to nearest neighbor classifier, after training Classifier in aiming field not tape label sample carry out Classification and Identification.In this way, the sorter model that training is obtained Robustness is higher, also more accurate to the image recognition effect of target domain.
Fig. 3 is the training effect schematic diagram of data processing method based on the embodiment of the present invention;As shown in figure 3, source Belong to generic source domain sample in domain subspace 301, in utilizing data processing method provided in an embodiment of the present invention most After excellent most neat matrix carries out spatial alignment, also assemble very much in space 302 after alignment, so that the model trained When the image to target domain carries out Classification and Identification, accuracy rate is higher.
Fig. 4 is the structure composition schematic diagram one of data processing equipment in the embodiment of the present invention, as shown in figure 4, described device It include: matrix determination unit 401, spatial alignment unit 402, training unit 403 and recognition unit 404;
Wherein, the matrix determination unit 401, for true according to the aggregation extent in source domain sample space after alignment Optimal alignment matrix is determined, so that belonging to the first source domain sample of the same category in the source domain sample, in space after alignment Aggregation extent it is maximum;
The spatial alignment unit 402, for using the optimal alignment matrix by the subspace of the source domain sample and The subspace of aiming field sample is aligned, and obtains source domain data set and aiming field data set;
The training unit 403, for being classified according to the source domain data set and the aiming field data set to arest neighbors Device is trained, the classifier after being trained;
The recognition unit 404, for using the classifier after training to the not sample of tape label in the aiming field sample This progress Classification and Identification.
In the embodiment of the present invention, described device further include: dimensionality reduction unit 405;
The dimensionality reduction unit 405, for utilizing PCA respectively to the source domain sample and the aiming field in luv space Sample carries out dimension-reduction treatment, obtains the subspace of the source domain sample and the subspace of the aiming field sample.
In the embodiment of the present invention, the matrix determination unit 401 is also used to according in source domain sample space after alignment Aggregation extent determine optimal alignment matrix so that the second source domain sample to belong to a different category in the source domain sample, right The aggregation extent in space after neat is minimum.
In the embodiment of the present invention, the matrix determination unit 401 is specifically also used to calculate optimal alignment matrix, so that described Difference in source domain sample in the mean value of the i-th class sample and the source domain sample between the mean value of all samples is minimum.
In the embodiment of the present invention, the matrix determination unit 401 is specifically also used to calculate optimal alignment matrix, so that described Difference in source domain sample in the mean value of the i-th class sample and the source domain sample between the mean value of all samples is maximum.
In the embodiment of the present invention, the matrix determination unit 401, which specifically is also used to construct in the source domain sample, belongs to phase The first scatter matrix in the space of the first generic source domain sample after alignment, first scatter matrix characterization described the Maximum aggregation extent in the space of one source domain sample after alignment;Construct second to belong to a different category in the source domain sample The second scatter matrix in the space of source domain sample after alignment, second scatter matrix characterize the second source domain sample and exist Minimized aggregation degree in space after alignment;According to first scatter matrix and second scatter matrix, determine described in Optimal alignment matrix.
In the embodiment of the present invention, the first scatter matrix SwExpression formula specifically can use following equation (1) building; The second scatter matrix SbExpression formula specifically can use following equation (2) building;
It enables:
Wherein, C indicates source domain sample S and aiming field sample T classification number affiliated in luv space;niIndicate i-th The number of class sample;μiIndicate the mean value of the i-th class sample;μ indicates the mean value of all samples;PsIndicate source domain subspace;M is indicated Source domain subspace PsWith aiming field subspace PtAlignment matrix;Indicate some specific sample;Subscript T representing matrix turns It sets.
In the embodiment of the present invention, the expression formula of the optimal alignment matrix can specifically be constructed by following formula (5):
Wherein, λ, β ∈ (0 ,+∞] indicate iotazation constant,Indicate subspace and the aiming field of source domain sample The distributional difference of the subspace of sample, SwIt indicates to belong to the sky of the first source domain sample of the same category after alignment in source domain sample Between in aggregation extent;SbIndicate point in the space of the second source domain sample to belong to a different category in source domain sample after alignment The degree of dissipating;M indicates optimal alignment matrix;λtr(Sw) it is matrix SwMark, β tr (Sb) it is matrix SbMark.
It should be understood that data processing equipment provided by the above embodiment is when carrying out image recognition, only with above-mentioned each The division progress of program module can according to need for example, in practical application and distribute above-mentioned processing by different journeys Sequence module is completed, i.e., the internal structure of data processing equipment is divided into different program modules, described above complete to complete Portion or part are handled.In addition, data processing equipment provided by the above embodiment belong to data processing method embodiment it is same Design, specific implementation process are detailed in embodiment of the method, and which is not described herein again.
Fig. 5 is the structure composition schematic diagram two of data processing equipment in the embodiment of the present invention;As shown in figure 5, the data Processing unit 500 can be mobile phone, computer, digital broadcast terminal, information transceiving equipment, game console, plate and set Standby, personal digital assistant, Information Push Server, content server, authentication server etc..Data processing shown in fig. 5 Device 500 includes: at least one processor 501, memory 502, at least one network interface 504 and user interface 503.Data Various components in processing unit 500 are coupled by bus system 505.It is understood that bus system 505 is for realizing this Connection communication between a little components.Bus system 505 except include data/address bus in addition to, further include power bus, control bus and Status signal bus in addition.But for the sake of clear explanation, various buses are all designated as bus system 505 in Fig. 5.
Wherein, user interface 503 may include display, keyboard, mouse, trace ball, click wheel, key, button, sense of touch Plate or touch screen etc..
It is appreciated that memory 502 can be volatile memory or nonvolatile memory, may also comprise volatibility and Both nonvolatile memories.Wherein, nonvolatile memory can be read-only memory (ROM, Read Only Memory), Programmable read only memory (PROM, Programmable Read-Only Memory), Erasable Programmable Read Only Memory EPROM (EPROM, Erasable Programmable Read-Only Memory), electrically erasable programmable read-only memory The storage of (EEPROM, Electrically Erasable Programmable Read-Only Memory), magnetic random access Device (FRAM, ferromagnetic random access memory), flash memory (Flash Memory), magnetic surface are deposited Reservoir, CD or CD-ROM (CD-ROM, Compact Disc Read-Only Memory);Magnetic surface storage can be Magnetic disk storage or magnetic tape storage.Volatile memory can be random access memory (RAM, Random Access Memory), it is used as External Cache.By exemplary but be not restricted explanation, the RAM of many forms is available, such as Static random access memory (SRAM, Static Random Access Memory), synchronous static random access memory (SSRAM, Synchronous Static Random Access Memory), dynamic random access memory (DRAM, Dynamic Random Access Memory), Synchronous Dynamic Random Access Memory (SDRAM, Synchronous Dynamic Random Access Memory), double data speed synchronous dynamic RAM (DDRSDRAM, Double Data Rate Synchronous Dynamic Random Access Memory), enhanced synchronous dynamic random Access memory (ESDRAM, Enhanced Synchronous Dynamic Random Access Memory), synchronized links Dynamic random access memory (SLDRAM, SyncLink Dynamic Random Access Memory), direct rambus Random access memory (DRRAM, Direct Rambus Random Access Memory).Description of the embodiment of the present invention is deposited Reservoir 502 is intended to include but is not limited to the memory of these and any other suitable type.
Memory 502 in the embodiment of the present invention is for storing various types of data to support data processing equipment 500 Operation.The example of these data includes: any computer program for operating on data processing equipment 500, is such as operated System 5021 and application program 5022;Wherein, operating system 5021 include various system programs, such as ccf layer, core library layer, Layer etc. is driven, for realizing various basic businesses and the hardware based task of processing.Application program 5022 may include various Application program, such as media player (Media Player), browser (Browser) etc., for realizing various applied business. Realize that the program of present invention method may be embodied in application program 5022.
The method that the embodiments of the present invention disclose can be applied in processor 501, or be realized by processor 501. Processor 501 may be a kind of IC chip, the processing capacity with signal.During realization, the above method it is each Step can be completed by the integrated logic circuit of the hardware in processor 501 or the instruction of software form.Above-mentioned processing Device 501 can be general processor, digital signal processor (DSP, Digital Signal Processor) or other can Programmed logic device, discrete gate or transistor logic, discrete hardware components etc..Processor 501 may be implemented or hold Disclosed each method, step and logic diagram in the row embodiment of the present invention.General processor can be microprocessor or appoint What conventional processor etc..The step of method in conjunction with disclosed in the embodiment of the present invention, can be embodied directly at hardware decoding Reason device executes completion, or in decoding processor hardware and software module combine and execute completion.Software module can be located at In storage medium, which is located at memory 502, and processor 501 reads the information in memory 502, in conjunction with its hardware The step of completing preceding method.
In the exemplary embodiment, data processing equipment 500 can be by one or more application specific integrated circuit (ASIC, Application Specific Integrated Circuit), DSP, programmable logic device (PLD, Programmable Logic Device), Complex Programmable Logic Devices (CPLD, Complex Programmable Logic Device), field programmable gate array (FPGA, Field-Programmable Gate Array), general processor, control Device, microcontroller (MCU, Micro Controller Unit), microprocessor (Microprocessor) or other electronics member Part is realized, for executing preceding method.
When the specific processor 501 runs the computer program, execute: according to the space of source domain sample after alignment In aggregation extent determine optimal alignment matrix so that belong to the first source domain sample of the same category in the source domain sample, Aggregation extent in space after alignment is maximum;
The subspace of the source domain sample and the subspace of aiming field sample are aligned using the optimal alignment matrix, obtained To source domain data set and aiming field data set;
Nearest neighbor classifier is trained according to the source domain data set and the aiming field data set, after being trained Classifier;
Using the classifier after training to the sample of tape label does not carry out Classification and Identification in the aiming field sample.
It when the processor 501 runs the computer program, also executes: according in source domain sample space after alignment Aggregation extent determine optimal alignment matrix so that the second source domain sample to belong to a different category in the source domain sample, right The aggregation extent in space after neat is minimum.
It when the processor 501 runs the computer program, also executes: optimal alignment matrix is calculated, so that the source Difference in the sample of domain in the mean value of the i-th class sample and the source domain sample between the mean value of all samples is minimum.
It when the processor 501 runs the computer program, also executes: optimal alignment matrix is calculated, so that the source Difference in the sample of domain in the mean value of the i-th class sample and the source domain sample between the mean value of all samples is maximum.
When the processor 501 runs the computer program, also execute: construct belong in the source domain sample it is mutually similar The first scatter matrix in the space of other first source domain sample after alignment, first scatter matrix characterize first source Maximum aggregation extent in the space of domain sample after alignment;
Second constructed in the space of the second source domain sample to belong to a different category in the source domain sample after alignment dissipates Cloth matrix, second scatter matrix characterize the minimized aggregation degree in the space of the second source domain sample after alignment;
According to first scatter matrix and second scatter matrix, the optimal alignment matrix is determined.
The first scatter matrix SwExpression formula utilize following equation (1) building;The second scatter matrix SbExpression Formula is constructed using following equation (2);
It enables:
Wherein, C indicates source domain sample S and aiming field sample T classification number affiliated in luv space;niIndicate i-th The number of class sample;μiIndicate the mean value of the i-th class sample;μ indicates the mean value of all samples;PsIndicate source domain subspace;M is indicated Source domain subspace PsWith aiming field subspace PtAlignment matrix;Indicate some specific sample;The transposition of subscript T representing matrix.
The expression formula of the optimal alignment matrix is constructed by following formula (5):
Wherein, λ, β ∈ (0 ,+∞] indicate iotazation constant,Indicate subspace and the aiming field of source domain sample The distributional difference of the subspace of sample, SwIt indicates to belong to the sky of the first source domain sample of the same category after alignment in source domain sample Between in aggregation extent;SbIndicate point in the space of the second source domain sample to belong to a different category in source domain sample after alignment The degree of dissipating;M indicates optimal alignment matrix;λtr(Sw) it is matrix SwMark, β tr (Sb) it is matrix SbMark.
It when the processor 501 runs the computer program, also executes: using Principal Component Analysis PCA respectively to original The source domain sample in beginning space and the aiming field sample carry out dimension-reduction treatment, obtain the source domain sample subspace and The subspace of the aiming field sample.
In the exemplary embodiment, the embodiment of the invention also provides a kind of computer readable storage medium, for example including The memory 502 of computer program, above-mentioned computer program can be executed by the processor 501 of data processing equipment 500, to complete Step described in preceding method.Computer readable storage medium can be FRAM, ROM, PROM, EPROM, EEPROM, Flash The memories such as Memory, magnetic surface storage, CD or CD-ROM;It is also possible to include one of above-mentioned memory or any group The various equipment closed, such as mobile phone, computer, tablet device, personal digital assistant.
A kind of computer readable storage medium, is stored thereon with computer program, which is run by processor When, it executes: optimal alignment matrix being determined according to the aggregation extent in source domain sample space after alignment, so that the source domain sample Belong to the first source domain sample of the same category in this, the aggregation extent in space after alignment is maximum;
The subspace of the source domain sample and the subspace of aiming field sample are aligned using the optimal alignment matrix, obtained To source domain data set and aiming field data set;
Nearest neighbor classifier is trained according to the source domain data set and the aiming field data set, after being trained Classifier;
Using the classifier after training to the sample of tape label does not carry out Classification and Identification in the aiming field sample.
It when the computer program is run by processor, also executes: according to the aggregation in source domain sample space after alignment Degree determines optimal alignment matrix, so that the second source domain sample to belong to a different category in the source domain sample, after alignment Aggregation extent in space is minimum.
It when the computer program is run by processor, also executes: optimal alignment matrix is calculated, so that in the source domain sample Difference in the mean value of i-th class sample and the source domain sample between the mean value of all samples is minimum.
It when the computer program is run by processor, also executes: optimal alignment matrix is calculated, so that in the source domain sample Difference in the mean value of i-th class sample and the source domain sample between the mean value of all samples is maximum.
It when the computer program is run by processor, also executes: constructing and belong to the of the same category in the source domain sample The first scatter matrix in the space of one source domain sample after alignment, first scatter matrix characterize the first source domain sample Maximum aggregation extent in space after alignment;
Second constructed in the space of the second source domain sample to belong to a different category in the source domain sample after alignment dissipates Cloth matrix, second scatter matrix characterize the minimized aggregation degree in the space of the second source domain sample after alignment;
According to first scatter matrix and second scatter matrix, the optimal alignment matrix is determined.
The first scatter matrix SwExpression formula utilize following equation (1) building;The second scatter matrix SbExpression Formula is constructed using following equation (2);
It enables:
Wherein, C indicates source domain sample S and aiming field sample T classification number affiliated in luv space;niIndicate i-th The number of class sample;μiIndicate the mean value of the i-th class sample;μ indicates the mean value of all samples;PsIndicate source domain subspace;M is indicated Source domain subspace PsWith aiming field subspace PtAlignment matrix;Indicate some specific sample;The transposition of subscript T representing matrix.
The expression formula of the optimal alignment matrix is constructed by following formula (5):
Wherein, λ, β ∈ (0 ,+∞] indicate iotazation constant,Indicate subspace and the aiming field of source domain sample The distributional difference of the subspace of sample, SwIt indicates to belong to the sky of the first source domain sample of the same category after alignment in source domain sample Between in aggregation extent;SbIndicate point in the space of the second source domain sample to belong to a different category in source domain sample after alignment The degree of dissipating;M indicates optimal alignment matrix;λtr(Sw) it is matrix SwMark, β tr (Sb) it is matrix SbMark.
It when the computer program is run by processor, also executes: using Principal Component Analysis PCA respectively in luv space The source domain sample and the aiming field sample carry out dimension-reduction treatment, obtain the source domain sample subspace and the target The subspace of domain sample.
The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any Those familiar with the art in the technical scope disclosed by the present invention, can easily think of the change or the replacement, and should all contain Lid is within protection scope of the present invention.Therefore, protection scope of the present invention should be based on the protection scope of the described claims.

Claims (10)

1. a kind of data processing method, which is characterized in that the described method includes:
Optimal alignment matrix is determined according to the aggregation extent in the space of source domain sample after alignment, so that in the source domain sample Belong to the first source domain sample of the same category, the aggregation extent in space after alignment is maximum;
The subspace of the source domain sample and the subspace of aiming field sample are aligned using the optimal alignment matrix, obtain source Numeric field data collection and aiming field data set;
Nearest neighbor classifier is trained according to the source domain data set and the aiming field data set, point after being trained Class device;
Using the classifier after training to the sample of tape label does not carry out Classification and Identification in the aiming field sample.
2. according to the method described in claim 1, the method also includes:
Optimal alignment matrix is determined according to the aggregation extent in the space of source domain sample after alignment, so that in the source domain sample The the second source domain sample to belong to a different category, the aggregation extent in space after alignment are minimum.
3. according to the method described in claim 1, being determined according to the aggregation extent in the space of source domain sample after alignment optimal Alignment matrix, so that belonging to the first source domain sample of the same category, the aggregation in space after alignment in the source domain sample Degree is maximum, comprising:
Optimal alignment matrix is calculated, so that all samples in the mean value of the i-th class sample and the source domain sample in the source domain sample Difference between this mean value is minimum.
4. according to the method described in claim 2, being determined according to the aggregation extent in the space of source domain sample after alignment optimal Alignment matrix, so that the second source domain sample to belong to a different category in the source domain sample, the aggregation in space after alignment Degree is minimum, comprising:
Optimal alignment matrix is calculated, so that all samples in the mean value of the i-th class sample and the source domain sample in the source domain sample Difference between this mean value is maximum.
5. according to the method described in claim 1, being determined according to the aggregation extent in the space of source domain sample after alignment optimal Alignment matrix, comprising:
It constructs first belonged in the space of the first source domain sample of the same category after alignment in the source domain sample and spreads square Battle array, first scatter matrix characterize the maximum aggregation extent in the space of the first source domain sample after alignment;
Second constructed in the space of the second source domain sample to belong to a different category in the source domain sample after alignment spreads square Battle array, second scatter matrix characterize the minimized aggregation degree in the space of the second source domain sample after alignment;
According to first scatter matrix and second scatter matrix, the optimal alignment matrix is determined.
6. according to the method described in claim 5, it is characterized in that, the first scatter matrix SwExpression formula utilize following public affairs Formula (1) building;The second scatter matrix SbExpression formula utilize following equation (2) building;
It enables:
Wherein, C indicates source domain sample S and aiming field sample T classification number affiliated in luv space;niIndicate the i-th class sample Number;μiIndicate the mean value of the i-th class sample;μ indicates the mean value of all samples;PsIndicate source domain subspace;M indicates source domain Space PsWith aiming field subspace PtAlignment matrix;Indicate some specific sample;The transposition of subscript T representing matrix.
7. the method according to claim 1, wherein constructing the optimal alignment matrix by following formula (5) Expression formula:
Wherein, λ, β ∈ (0 ,+∞] indicate iotazation constant,Indicate subspace and the aiming field sample of source domain sample Subspace distributional difference, SwIt indicates to belong in the space of the first source domain sample of the same category after alignment in source domain sample Aggregation extent;SbIndicate the dispersion journey in the space of the second source domain sample to belong to a different category in source domain sample after alignment Degree;M indicates optimal alignment matrix;λtr(Sw) it is matrix SwMark, β tr (Sb) it is matrix SbMark.
8. the method according to claim 1, wherein according to the aggregation in source domain sample space after alignment Before degree determines optimal alignment matrix, the method also includes:
Using Principal Component Analysis PCA respectively to the source domain sample and aiming field sample progress dimensionality reduction in luv space Processing, obtains the subspace of the source domain sample and the subspace of the aiming field sample.
9. a kind of data processing equipment, described device includes: matrix determination unit, spatial alignment unit, training unit and identification Unit;
Wherein, the matrix determination unit, it is optimal for being determined according to the aggregation extent in source domain sample space after alignment Alignment matrix, so that belonging to the first source domain sample of the same category, the aggregation in space after alignment in the source domain sample Degree is maximum;
The spatial alignment unit, for utilizing the optimal alignment matrix by the subspace of the source domain sample and aiming field sample This subspace alignment, obtains source domain data set and aiming field data set;
The training unit, for being instructed according to the source domain data set and the aiming field data set to nearest neighbor classifier Practice, the classifier after being trained;
The recognition unit, for using the classifier after training to the sample of tape label does not divide in the aiming field sample Class identification.
10. a kind of data processing equipment, described device includes: memory and processor;
Wherein, the memory, for storing the computer program that can be run on the processor;
The processor, when for running the computer program, the step of perform claim requires any one of 1 to 8 the method.
CN201810859161.4A 2018-07-31 2018-07-31 Data processing method and device Active CN109165679B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810859161.4A CN109165679B (en) 2018-07-31 2018-07-31 Data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810859161.4A CN109165679B (en) 2018-07-31 2018-07-31 Data processing method and device

Publications (2)

Publication Number Publication Date
CN109165679A true CN109165679A (en) 2019-01-08
CN109165679B CN109165679B (en) 2021-05-28

Family

ID=64898395

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810859161.4A Active CN109165679B (en) 2018-07-31 2018-07-31 Data processing method and device

Country Status (1)

Country Link
CN (1) CN109165679B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112187502A (en) * 2019-07-05 2021-01-05 ***通信集团河南有限公司 Method for positioning depth coverage requirement

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101369316A (en) * 2008-07-09 2009-02-18 东华大学 Image characteristics extraction method based on global and local structure amalgamation
CN102663413A (en) * 2012-03-09 2012-09-12 中盾信安科技(江苏)有限公司 Multi-gesture and cross-age oriented face image authentication method
CN107045640A (en) * 2017-03-31 2017-08-15 南京邮电大学 A kind of method kept based on neighborhood with kernel space alignment for image recognition

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101369316A (en) * 2008-07-09 2009-02-18 东华大学 Image characteristics extraction method based on global and local structure amalgamation
CN101369316B (en) * 2008-07-09 2011-08-31 东华大学 Image characteristics extraction method based on global and local structure amalgamation
CN102663413A (en) * 2012-03-09 2012-09-12 中盾信安科技(江苏)有限公司 Multi-gesture and cross-age oriented face image authentication method
CN107045640A (en) * 2017-03-31 2017-08-15 南京邮电大学 A kind of method kept based on neighborhood with kernel space alignment for image recognition

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
吴松松等: "基于核子空间对齐的非监督邻域自适应", 《南京邮电大学学报(自然科学版)》 *
许明微: "图像识别中的非监督领域自适应方法研究", 《中国优秀硕士学位论文全文数据库(信息科技辑)》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112187502A (en) * 2019-07-05 2021-01-05 ***通信集团河南有限公司 Method for positioning depth coverage requirement
CN112187502B (en) * 2019-07-05 2022-12-23 ***通信集团河南有限公司 Method for positioning depth coverage requirement

Also Published As

Publication number Publication date
CN109165679B (en) 2021-05-28

Similar Documents

Publication Publication Date Title
CN108681746A (en) A kind of image-recognizing method, device, electronic equipment and computer-readable medium
CN108898181A (en) Image classification model processing method and device and storage medium
CN102156885B (en) Image classification method based on cascaded codebook generation
CN110210625A (en) Modeling method, device, computer equipment and storage medium based on transfer learning
CN107230108A (en) The processing method and processing device of business datum
CN109409241A (en) Video checking method, device, equipment and readable storage medium storing program for executing
CN110991532A (en) Scene graph generation method based on relational visual attention mechanism
CN103605711A (en) Construction method and device, classification method and device of support vector machine
CN107967461A (en) The training of SVM difference models and face verification method, apparatus, terminal and storage medium
CN109948680A (en) The classification method and system of medical record data
CN107480621A (en) A kind of age recognition methods based on facial image
Chen et al. Deep joint two-stream Wasserstein auto-encoder and selective attention alignment for unsupervised domain adaptation
CN110866564A (en) Season classification method, system, electronic device and medium for multiple semi-supervised images
CN108108769A (en) Data classification method and device and storage medium
CN109165679A (en) Data processing method and device
Anderson Information processing artifacts
Peng et al. Automatic monitoring system for seed germination test based on deep learning
Ahsan et al. Clustering social event images using kernel canonical correlation analysis
Liu et al. Sample hardness based gradient loss for long-tailed cervical cell detection
Rosello et al. Kurcuma: a kitchen utensil recognition collection for unsupervised domain adaptation
Wang et al. Extrinsic Least Squares Regression with Closed‐Form Solution on Product Grassmann Manifold for Video‐Based Recognition
Zong et al. Research on data mining of sports wearable intelligent devices based on big data analysis
US12032549B2 (en) Techniques for creating and utilizing multidimensional embedding spaces
CN110188073A (en) Method, apparatus, storage medium and the computer equipment of In vivo detection log parsing
CN110458237B (en) Semantic recognition method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant