CN111639698A - Sample fusion method and device - Google Patents

Sample fusion method and device Download PDF

Info

Publication number
CN111639698A
CN111639698A CN202010464546.8A CN202010464546A CN111639698A CN 111639698 A CN111639698 A CN 111639698A CN 202010464546 A CN202010464546 A CN 202010464546A CN 111639698 A CN111639698 A CN 111639698A
Authority
CN
China
Prior art keywords
label
matching
tag
fused
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010464546.8A
Other languages
Chinese (zh)
Inventor
苏英菲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Neusoft Reach Automotive Technology Shenyang Co Ltd
Original Assignee
Neusoft Reach Automotive Technology Shenyang Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Neusoft Reach Automotive Technology Shenyang Co Ltd filed Critical Neusoft Reach Automotive Technology Shenyang Co Ltd
Priority to CN202010464546.8A priority Critical patent/CN111639698A/en
Publication of CN111639698A publication Critical patent/CN111639698A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/251Fusion techniques of input or preprocessed data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a sample fusion method and a device, which relate to the technical field of data processing, and the method comprises the following steps: obtaining a sample to be fused; wherein, the sample to be fused carries the label information of the original label; the label information of the original label comprises the label content and the label name of the original label; performing data reasoning on the sample to be fused through a pre-trained preset neural network to obtain a data reasoning result; the data reasoning result comprises label information of a prediction label of the sample to be fused; the label information of the predicted label comprises the label content and the label name of the predicted label; and performing sample fusion processing on the sample to be fused based on the label information of the prediction label of the sample to be fused and the label information of the original label in the data reasoning result. The method and the device have the advantages that the technical problems that the traditional sample fusion mode is low in efficiency and poor in accuracy are solved.

Description

Sample fusion method and device
Technical Field
The invention relates to the technical field of data processing, in particular to a sample fusion method and a sample fusion device.
Background
With the rapid development of artificial intelligence technology, the application of artificial intelligence technology is also more and more extensive, for example, in the field of image recognition. In the artificial intelligence technology, a common technical solution is to process data through a neural network, for example, to perform recognition processing on an image through the neural network, so as to obtain a recognition result. However, when the neural network is used for image recognition, the neural network needs to be trained, and the training accuracy of the neural network affects the test accuracy. Therefore, training of the neural network is important. In training a neural network, a large sample set is required. For example, the open source data set may be selected for training of the network. However, since the open source data set has a limited variety and is different from the desired data set label variety, efficient fusion of the data sets is required. In the prior art, the open source data set can be manually tagged to match the open source data set with the desired data set tag categories. Because the data volume of the source data set is large, when the source data set is adjusted manually, the efficiency is low, and the accuracy is poor.
Disclosure of Invention
In view of the above, the present invention provides a sample fusion method and apparatus to alleviate the technical problems of low efficiency and poor accuracy of the conventional sample fusion method.
In a first aspect, an embodiment of the present invention provides a sample fusion method, including: obtaining a sample to be fused carrying label information of an original label; the label information of the original label comprises the label content and the label name of the original label; performing data reasoning on the sample to be fused through a pre-trained preset neural network to obtain a data reasoning result; the data reasoning result comprises label information of the prediction label of the sample to be fused; the label information of the predicted label comprises the label content and the label name of the predicted label; and performing sample fusion processing on the sample to be fused based on the label information of the prediction label of the sample to be fused and the label information of the original label in the data reasoning result.
Further, the sample fusion processing on the sample to be fused based on the label information of the prediction label of the sample to be fused and the label information of the original label in the data inference result includes: calculating an intersection ratio between the predicted label and the original label based on the label content of the predicted label and the label content of the original label; matching the predicted label and the original label based on the intersection ratio to obtain a label matching result; and performing fusion processing on the label of the sample to be fused based on the label matching result so as to realize the fusion processing on the sample to be fused.
Further, the fusing the label of the sample to be fused based on the label matching result comprises: and if the label matching result is that the predicted label contains a first matching label matched with the original label, performing fusion processing on the label of the sample to be fused based on the label name of the first matching label.
Further, the fusing the label of the sample to be fused based on the label name of the first matching label includes: and if the tag names of the first matching tag and the second matching tag are the same, determining the tag name and the tag content of the second matching tag as the tag name and the tag content of the first matching tag, wherein the second matching tag is a tag which is matched with the first matching tag in the original tag.
Further, the fusing the label of the sample to be fused based on the label name of the first matching label includes: if the tag names of the first matching tag and the second matching tag are different, judging whether the tag name of the first matching tag and the tag name of the second matching tag are opposite tags or not; the label alignment label indicates that the objects corresponding to the first matching label and the second matching label are the same; and if the label is the opposite label, replacing the label content of the first matching label with the label content of the second matching label.
Further, the method further comprises: if the matching label is not the matching label, judging whether a preset label list contains the label name of the second matching label; if so, replacing the label name of the first matching label with the label name of the second matching label, and replacing the label content of the first matching label with the label content of the second matching label; and if not, generating prompt information to prompt that the label information of the first matching label is error information.
Further, the fusing the label of the sample to be fused based on the label matching result comprises: if the label matching result is that the label matched with the target predicted label in the predicted labels is not found in the original labels, the label content and the label name of the target predicted label are reserved.
Further, the fusing the label of the sample to be fused based on the label matching result comprises: and if the label matching result is that the original label contains a target original label, deleting the label content and the label name of the target original label, wherein the target original label is a label which is not contained in the data inference result and is contained in the original label.
Further, the data reasoning of the sample to be fused through the pre-trained preset neural network comprises: grouping the samples to be fused to obtain a plurality of groups of samples to be fused; performing data reasoning on each group of samples to be fused through a pre-trained preset neural network to obtain corresponding data reasoning results, and fusing the corresponding samples to be fused based on the corresponding data reasoning results to obtain fused samples; after the fused samples are obtained, training a preset neural network based on the fused samples, so that the trained preset neural network performs data reasoning on the next group of samples to be fused.
In a second aspect, an embodiment of the present invention further provides a sample fusion apparatus, including: the system comprises an acquisition unit, a fusion unit and a fusion unit, wherein the acquisition unit is used for acquiring a sample to be fused carrying label information of an original label; the label information of the original label comprises the label content and the label name of the original label; the data reasoning unit is used for carrying out data reasoning on the sample to be fused through a pre-trained preset neural network to obtain a data reasoning result; the data reasoning result comprises label information of the prediction label of the sample to be fused; the label information of the predicted label comprises the label content and the label name of the predicted label; and the fusion unit is used for carrying out sample fusion processing on the sample to be fused based on the label information of the prediction label of the sample to be fused and the label information of the original label in the data inference result.
In the embodiment of the invention, firstly, a sample to be fused is obtained, wherein the sample to be fused carries the label information of an original label, then, data reasoning is carried out on the sample to be fused through a pre-trained preset neural network to obtain a data reasoning result, and finally, sample fusion processing is carried out on the sample to be fused based on the label information of a prediction label of the sample to be fused and the label information of the original label in the data reasoning result. According to the description, the sample fusion method is adopted, so that a large number of data set samples meeting the requirements can be obtained quickly, and the technical problems that the traditional sample fusion mode is low in efficiency and poor in accuracy are solved.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
FIG. 1 is a flow chart of a sample fusion method according to an embodiment of the present invention;
FIG. 2 is a flow chart of another sample fusion method according to an embodiment of the present invention;
FIG. 3 is a schematic view of a sample fusion device according to an embodiment of the present invention;
fig. 4 is a schematic diagram of an electronic device according to an embodiment of the invention.
Detailed Description
To make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The first embodiment is as follows:
in accordance with an embodiment of the present invention, there is provided an embodiment of a sample fusion method, it should be noted that the steps illustrated in the flowchart of the accompanying drawings may be performed in a computer system such as a set of computer-executable instructions, and that while a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than here.
Fig. 1 is a flow chart of a sample fusion method according to an embodiment of the present invention, as shown in fig. 1, the method including the steps of:
step S102, obtaining a sample to be fused carrying label information of an original label; the label information of the original label comprises the label content and the label name of the original label.
In this application, the sample to be fused may be an open source data set, and may also be other data sets capable of performing sample fusion, which is not specifically limited in this application.
Note that, the sample to be fused carries tag information of the original tag, for example, a tag name and a tag content of the original tag. Wherein the tag content can be understood as the location information of the original tag.
For example, if an object a1 is included in a sample to be fused, the original tag can be understood as a bounding box, and the object a1 is enclosed in the bounding box. At this time, the tag name may be the name information of the object a1, for example, if the object a1 is a bicycle, then the tag name may be bike; the label content may be position information of the enclosure frame, for example, upper left corner coordinates and lower right corner coordinates of the enclosure frame, and length and width information of the enclosure frame.
In the present application, the shape of the enclosure frame is not limited to a rectangular shape, and may be other shapes, for example, a circular shape, an oval shape, and the like.
Step S104, carrying out data reasoning on the sample to be fused through a pre-trained preset neural network to obtain a data reasoning result; the data reasoning result comprises label information of the prediction label of the sample to be fused; the label information of the predicted label comprises the label content and the label name of the predicted label.
In the present application, the preset neural network is a neural network after being trained in advance, and the structure of the neural network is not particularly limited in the present application. For example, the neural network may be trained through a small batch of data sets. The small-batch data set may be a data set manually labeled by a related technician in advance, or may be a data set obtained in another form, which is not specifically limited in this application.
In the application, the label content and the label name of the predicted label are obtained after the pre-trained preset neural network processes the sample to be fused.
Note that, the label content here can be understood as the position information of the original label; the tag name here can be understood as the name of the object it has framed.
And step S106, carrying out sample fusion processing on the sample to be fused based on the label information of the prediction label of the sample to be fused and the label information of the original label in the data reasoning result.
In the application, the sample fusion processing is performed on the sample to be fused, which can be understood as fusing the prediction tag and the original tag, so that the fused sample meets the user requirement.
In the embodiment of the invention, firstly, a sample to be fused is obtained, wherein the sample to be fused carries the label information of an original label, then, data reasoning is carried out on the sample to be fused through a pre-trained preset neural network to obtain a data reasoning result, and finally, sample fusion processing is carried out on the sample to be fused based on the label information of a prediction label of the sample to be fused and the label information of the original label in the data reasoning result. According to the description, the sample fusion method is adopted, a large number of data set samples meeting the requirements can be obtained quickly, and the technical problems that the traditional sample fusion mode is low in efficiency and poor in accuracy are solved.
As can be seen from the above description, in the present application, first, a sample to be fused is obtained, and then, data inference is performed on the sample to be fused through a pre-trained preset neural network, so as to obtain a data inference result.
In an optional embodiment, performing data inference on the sample to be fused through a pre-set neural network after pre-training may be described as the following process:
firstly, grouping the samples to be fused to obtain a plurality of groups of samples to be fused;
then, performing data reasoning on each group of samples to be fused through a pre-trained preset neural network to obtain corresponding data reasoning results, and fusing the corresponding samples to be fused based on the corresponding data reasoning results to obtain fused samples; after the fused samples are obtained, training a preset neural network based on the fused samples, so that the trained preset neural network performs data reasoning on the next group of samples to be fused.
Specifically, in the present application, after obtaining the samples to be fused, the samples to be fused may be grouped first, for example, the samples to be fused may be averagely divided into N groups. And then, carrying out data reasoning on each group of samples to be fused through a pre-trained preset neural network to obtain a corresponding data reasoning result. After the data reasoning result is obtained, sample fusion operation can be carried out on each group of samples to be fused.
For example, the plurality of sets of samples to be fused include: a group of samples to be fused Q1, a group of samples to be fused Q2, and a group of samples to be fused Q3. In the application, data inference is firstly carried out on a to-be-fused sample group Q1 through a pre-trained preset neural network to obtain a data inference result M1, and then sample fusion processing is carried out on the to-be-fused sample group Q1 based on the data inference result M1 to obtain a fused sample P1. Next, the preset neural network is trained again by the sample P1.
Then, data reasoning is carried out on the to-be-fused sample group Q2 through the trained preset neural network to obtain a data reasoning result M2, and then sample fusion processing is carried out on the to-be-fused sample group Q2 based on the data reasoning result M2 to obtain a fused sample P2. Next, the preset neural network is trained again by the sample P2.
Next, data inference is performed on the to-be-fused sample group Q3 through the trained preset neural network to obtain a data inference result M3, and then, sample fusion processing is performed on the to-be-fused sample group Q3 based on the data inference result M3 to obtain a fused sample P3.
According to the above description, in the present application, the processing precision of the preset neural network can be gradually improved by adopting a mode of fusion while training, and a more accurate data reasoning result can be obtained in the subsequent sample fusion processing process.
In an optional embodiment, as shown in fig. 2, in step S106, a sample fusion process is performed on the to-be-fused sample based on the label information of the prediction label of the to-be-fused sample and the label information of the original label in the data inference result, which includes the following processes:
step S11, calculating an intersection ratio between the predicted label and the original label based on the label content of the predicted label and the label content of the original label.
As can be seen from the above description, a tag is understood as a bounding box containing the object to be identified. Aiming at each predicted label, corresponding to an enclosure frame, wherein each enclosure frame corresponds to a label name and label content; similarly, each original label corresponds to one bounding box, and each bounding box corresponds to one label name and label content.
Based on this, in the present application, an intersection ratio between the predicted label and the original label may be calculated, where the calculation formula of the intersection ratio is:
Figure BDA0002511603280000081
wherein, a can be understood as the area of the predicted label, B can be understood as the area of the original label, and IOU is the cross-over ratio.
In an alternative embodiment, the cross-over ratio between each predicted label and each original label may be calculated. In addition, for each predicted tag, an original tag intersected with the predicted tag may be determined among a plurality of original tags, and then an intersection ratio between the predicted tag and the original tag intersected with the predicted tag may be calculated.
In the method, by adopting a mode of calculating the intersection and parallel ratio between the predicted label and the intersected original label, unnecessary data calculation processes can be reduced, so that the data calculation process is simplified, and the data processing efficiency is improved.
And step S12, matching the predicted label and the original label based on the intersection ratio to obtain a label matching result.
As can be seen from the above description, in the present application, after the intersection ratio between the predicted tag and the original tag is calculated according to the above-described process, the predicted tag and the original tag can be matched based on the intersection ratio. For example, a threshold may be preset, and then the predicted tag and the original tag whose intersection ratio is greater than the threshold may be determined as matching tags.
In this application, the tag matching result indicates whether the predicted tag and the original tag include matching tags, where tag matching means: the intersection ratio between the predicted label and the original label is greater than the threshold.
And step S13, fusing the labels of the samples to be fused based on the label matching result so as to realize the fusion of the samples to be fused.
In the present application, after the tag matching result is obtained in the manner described above, the tag of the sample to be fused may be fused based on the tag matching result, so as to implement the fusion of the sample to be fused.
As can be seen from the above description, in the present application, the predicted tag and the original tag can be matched more quickly and accurately by using a way of matching the predicted tag and the original tag by using a cross-over ratio. After a more accurate tag matching result is obtained, the sample fusion precision can be further improved.
In an optional embodiment, in step S13, the process of fusing the tags of the sample to be fused based on the tag matching result includes the following steps:
and if the label matching result is that the predicted label contains a first matching label matched with the original label, performing fusion processing on the label of the sample to be fused based on the label name of the first matching label.
As can be seen from the above description, in the present application, the predicted label and the original label are matched in a way of calculating the cross-over ratio. If the obtained tag matching result is that the predicted tag contains a first matching tag matched with the original tag, the tag name of the first matching tag can be further judged, so that the tags of the sample to be fused are fused according to the judgment result.
For example, it can be determined that the label a in the predicted label and the label b in the original label are matched by calculating the cross-over ratio, that is, the cross-over ratio of the label a and the label b is greater than the threshold. At this time, it is determined that the tag a is the first matching tag, and the tag b is the second matching tag. In this case, the label of the sample to be fused may be further fused based on the label name of the label a and the label name of the label b.
Specifically, the tag name of the first matching tag and the tag name of the second matching tag may be compared. And if the comparison result shows that the tag names of the first matching tag and the second matching tag are the same, determining the tag name and the tag content of the second matching tag as the tag name and the tag content of the first matching tag, wherein the second matching tag is the tag matched with the first matching tag in the original tags.
That is, in the present application, the first matching tag is a tag in the predicted tag, the second matching tag is a tag in the original tag, and the first matching tag matches with the second matching tag. If the label name of the first matching label is bike and the label name of the second matching label is also bike, the label name of the first matching label is the same as the label name of the second matching label, at this time, the label name of the second matching label is determined as the label name of the first matching label, and the label content of the second matching label is determined as the label content of the first matching label.
It should be noted that, in the present application, the pre-set neural network after being trained in advance is a model obtained after being trained through a small batch of data sets, and therefore, the processing accuracy of the pre-set neural network may be smaller than a certain accuracy threshold. However, the original tag in the open source data set may be a tag after the related art tag is manually labeled, and thus, the reliability or accuracy of the tag name and the tag content of the original tag in the open source data set is higher with respect to the reliability of the predicted tag, and thus, if the tag names of the first matching tag and the second matching tag are the same, the tag name and the tag content of the second matching tag may be determined as the tag name and the tag content of the first matching tag.
If the comparison result indicates that the tag names of the first matching tag and the second matching tag are different, judging whether the tag name of the first matching tag and the tag name of the second matching tag are the opposite tags or not; the label alignment label indicates that the objects corresponding to the first matching label and the second matching label are the same. And if the matching label is the opposite label, replacing the label content of the first matching label with the label content of the second matching label.
In the present application, multiple sets of tag pairs are preset, for example, bike and bicycle are tag pairs, and cat and catt are tag pairs. That is to say, the names of the two labels that are the label tags of the opposite sign are different, but the objects corresponding to the two labels are the same, and in this case, the label tags are called as opposite sign labels.
Therefore, if it is determined based on the comparison result that the tag names of the first matching tag and the second matching tag are not the same, at this time, it may be determined whether the first matching tag and the second matching tag are the opposite tags among the preset plurality of sets of opposite tags. For example, if the tag name of the first matching tag is bike and the tag name of the second matching tag is bicycle, then the first matching tag and the second matching tag may be determined to be contra-tags. At this time, the tag name of the first matching tag is retained, and the tag content of the first matching tag is replaced with the tag content of the second matching tag.
In the present application, the reason why the tag name of the first matching tag is retained is that although the tag name of the first matching tag and the tag name of the second matching tag are not identical, the meaning indicated is the same, and thus the tag name of the first matching tag can be retained. Further, the reason for replacing the label content of the first matching label with the label content of the second matching label is that the label content of the second matching label is more accurate, and when other neural networks are trained through the sample, a more accurate network model is obtained.
In another alternative embodiment, if the first matching tag and the second matching tag are not opposite tags, that is, the meanings indicated by the first matching tag and the second matching tag are not the same, at this time, it may be further determined whether the preset tag list includes the tag name of the second matching tag. The preset label list is a preset list of labels which can be identified by a preset neural network after being trained in advance.
If the preset tag list is judged to contain the tag name of the second matching tag, the prediction of the preset neural network is possibly wrong, and at the moment, the tag name of the first matching tag can be replaced by the tag name of the second matching tag; and replacing the tag content of the first matching tag with the tag content of the second matching tag.
And if the preset label list does not contain the label name of the second matching label, generating prompt information to prompt that the label information of the first matching label is error information. At this time, the related art person can manually adjust the tag information of the first matching tag based on the prompt information.
According to the above description, in the present application, the sample data set required by the user can be obtained more quickly and accurately in the manner of fusing the tags of the samples to be fused by using the above manner. Furthermore, by combining a manual adjustment mode, the label precision of the sample to be fused can be further improved on the basis of shortening the sample fusion time.
The above-described sample fusion process is a fusion process in a case where it is determined that the predicted tag includes the first matching tag that matches the original tag based on the tag matching result. The following describes the specific operation of the predicted tag and the original tag that do not match.
The first condition is as follows:
if the label matching result is that the label matched with the target predicted label in the predicted labels is not found in the original labels, the label content and the label name of the target predicted label are reserved.
For example, the target prediction tag "dog" is included in the prediction tag, but the target prediction tag "dog" is not included in the original tag, and then the tag information of the target prediction tag "dog", that is, the tag name and the tag content, is retained at this time.
Case two:
and if the label matching result is that the original label contains a target original label, deleting the label content and the label name of the target original label, wherein the target original label is a label which is not contained in the data inference result and is contained in the original label.
For example, the original tag contains a target original tag cat, which is only present in the original tag but not present in the predicted tag, and at this time, the tag content and the tag name of the target original tag may be deleted. It should be appreciated that the absence of the target original tag in the predicted tag indicates that the target original tag is not needed by the user, and the tag information of the tag may be deleted.
Example two:
the embodiment of the present invention further provides a sample fusion device, which is mainly used for executing the sample fusion method provided by the above-mentioned content of the embodiment of the present invention, and the sample fusion device provided by the embodiment of the present invention is specifically described below.
Fig. 3 is a schematic diagram of a sample fusion device according to an embodiment of the present invention, as shown in fig. 3, the sample fusion device mainly includes: an acquisition unit 10, a data inference unit 20 and a fusion unit 30, wherein:
the acquiring unit 10 is used for acquiring a sample to be fused carrying label information of an original label; the label information of the original label comprises the label content and the label name of the original label;
the data reasoning unit 20 is used for performing data reasoning on the sample to be fused through a pre-trained preset neural network to obtain a data reasoning result; the data reasoning result comprises label information of the prediction label of the sample to be fused; the label information of the predicted label comprises the label content and the label name of the predicted label;
and the fusion unit 30 is configured to perform sample fusion processing on the to-be-fused sample based on the label information of the prediction label of the to-be-fused sample and the label information of the original label in the data inference result.
In the embodiment of the invention, firstly, a sample to be fused is obtained, wherein the sample to be fused carries the label information of an original label, then, data reasoning is carried out on the sample to be fused through a pre-trained preset neural network to obtain a data reasoning result, and finally, sample fusion processing is carried out on the sample to be fused based on the label information of a prediction label of the sample to be fused and the label information of the original label in the data reasoning result. According to the description, the sample fusion method is adopted, a large number of data set samples meeting the requirements can be obtained quickly, and the technical problems that the traditional sample fusion mode is low in efficiency and poor in accuracy are solved.
Optionally, the fusion unit is configured to: calculating an intersection ratio between the predicted label and the original label based on the label content of the predicted label and the label content of the original label; matching the predicted label and the original label based on the intersection ratio to obtain a label matching result; and performing fusion processing on the label of the sample to be fused based on the label matching result so as to realize the fusion processing on the sample to be fused.
Optionally, the fusion unit is further configured to: and if the label matching result is that the predicted label contains a first matching label matched with the original label, performing fusion processing on the label of the sample to be fused based on the label name of the first matching label.
Optionally, the fusion unit is further configured to: and if the tag names of the first matching tag and the second matching tag are the same, determining the tag name and the tag content of the second matching tag as the tag name and the tag content of the first matching tag, wherein the second matching tag is a tag which is matched with the first matching tag in the original tag.
Optionally, the fusion unit is further configured to: if the tag names of the first matching tag and the second matching tag are different, judging whether the tag name of the first matching tag and the tag name of the second matching tag are opposite tags or not; the label alignment label indicates that the objects corresponding to the first matching label and the second matching label are the same; and if the label is the opposite label, replacing the label content of the first matching label with the label content of the second matching label.
Optionally, the fusion unit is further configured to: if the matching label is not the matching label, judging whether a preset label list contains the label name of the second matching label; if so, replacing the label name of the first matching label with the label name of the second matching label; and if not, generating prompt information to prompt that the label name of the first matching label is error information.
Optionally, the fusion unit is further configured to: if the label matching result is that the predicted label does not contain a first matching label matched with the original label, the label content and the label name of the first matching label are reserved.
Optionally, the fusion unit is further configured to: and if the label matching result is that the original label contains a target original label, deleting the label content and the label name of the target original label, wherein the target original label is a label which is not contained in the data inference result and is contained in the original label.
Optionally, the data inference unit is configured to: grouping the samples to be fused to obtain a plurality of groups of samples to be fused; performing data reasoning on each group of samples to be fused through a pre-trained preset neural network to obtain corresponding data reasoning results, and fusing the corresponding samples to be fused based on the corresponding data reasoning results to obtain fused samples; after the fused samples are obtained, training a preset neural network based on the fused samples, so that the trained preset neural network performs data reasoning on the next group of samples to be fused.
The device provided by the embodiment of the present invention has the same implementation principle and technical effect as the method embodiments, and for the sake of brief description, reference may be made to the corresponding contents in the method embodiments without reference to the device embodiments.
Example three:
referring to fig. 4, an embodiment of the present invention further provides an electronic device 100, including: a processor 40, a memory 41, a bus 42 and a communication interface 43, wherein the processor 40, the communication interface 43 and the memory 41 are connected through the bus 42; the processor 40 is arranged to execute executable modules, such as computer programs, stored in the memory 41.
The Memory 41 may include a high-speed Random Access Memory (RAM) and may also include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. The communication connection between the network element of the system and at least one other network element is realized through at least one communication interface 43 (which may be wired or wireless), and the internet, a wide area network, a local network, a metropolitan area network, etc. may be used.
The bus 42 may be an ISA bus, PCI bus, EISA bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one double-headed arrow is shown in FIG. 4, but that does not indicate only one bus or one type of bus.
The memory 41 is used for storing a program, the processor 40 executes the program after receiving an execution instruction, and the method executed by the apparatus defined by the flow process disclosed in any of the foregoing embodiments of the present invention may be applied to the processor 40, or implemented by the processor 40.
The processor 40 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 40. The Processor 40 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the device can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field-Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component. The various methods, steps and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory 41, and the processor 40 reads the information in the memory 41 and completes the steps of the method in combination with the hardware thereof.
In addition, in the description of the embodiments of the present invention, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.
In the description of the present invention, it should be noted that the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc., indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience of description and simplicity of description, but do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A method of sample fusion, comprising:
obtaining a sample to be fused carrying label information of an original label; the label information of the original label comprises the label content and the label name of the original label;
performing data reasoning on the sample to be fused through a pre-trained preset neural network to obtain a data reasoning result; the data reasoning result comprises label information of the prediction label of the sample to be fused; the label information of the predicted label comprises the label content and the label name of the predicted label;
and performing sample fusion processing on the sample to be fused based on the label information of the prediction label of the sample to be fused and the label information of the original label in the data reasoning result.
2. The method according to claim 1, wherein performing sample fusion processing on the to-be-fused sample based on the label information of the predictive label and the label information of the original label of the to-be-fused sample in the data inference result comprises:
calculating an intersection ratio between the predicted label and the original label based on the label content of the predicted label and the label content of the original label;
matching the predicted label and the original label based on the intersection ratio to obtain a label matching result;
and performing fusion processing on the label of the sample to be fused based on the label matching result so as to realize the fusion processing on the sample to be fused.
3. The method according to claim 2, wherein fusing the labels of the samples to be fused based on the label matching result comprises:
and if the label matching result is that the predicted label contains a first matching label matched with the original label, performing fusion processing on the label of the sample to be fused based on the label name of the first matching label.
4. The method according to claim 3, wherein fusing the tags of the to-be-fused sample based on the tag name of the first matching tag comprises:
and if the tag names of the first matching tag and the second matching tag are the same, determining the tag name and the tag content of the second matching tag as the tag name and the tag content of the first matching tag, wherein the second matching tag is a tag which is matched with the first matching tag in the original tag.
5. The method according to claim 4, wherein fusing the tags of the to-be-fused sample based on the tag name of the first matching tag comprises:
if the tag names of the first matching tag and the second matching tag are different, judging whether the tag name of the first matching tag and the tag name of the second matching tag are opposite tags or not; the label alignment label indicates that the objects corresponding to the first matching label and the second matching label are the same;
and if the label is the opposite label, replacing the label content of the first matching label with the label content of the second matching label.
6. The method of claim 5, further comprising:
if the matching label is not the matching label, judging whether a preset label list contains the label name of the second matching label;
if so, replacing the label name of the first matching label with the label name of the second matching label, and replacing the label content of the first matching label with the label content of the second matching label;
and if not, generating prompt information to prompt that the label information of the first matching label is error information.
7. The method according to claim 2, wherein fusing the labels of the samples to be fused based on the label matching result comprises:
if the label matching result is that the label matched with the target predicted label in the predicted labels is not found in the original labels, the label content and the label name of the target predicted label are reserved.
8. The method according to claim 2, wherein fusing the labels of the samples to be fused based on the label matching result comprises:
and if the label matching result is that the original label contains a target original label, deleting the label content and the label name of the target original label, wherein the target original label is a label which is not contained in the data inference result and is contained in the original label.
9. The method according to claim 2, wherein the data inference of the sample to be fused through the pre-trained preset neural network comprises:
grouping the samples to be fused to obtain a plurality of groups of samples to be fused;
performing data reasoning on each group of samples to be fused through a pre-trained preset neural network to obtain corresponding data reasoning results, and fusing the corresponding samples to be fused based on the corresponding data reasoning results to obtain fused samples;
after the fused samples are obtained, training a preset neural network based on the fused samples, so that the trained preset neural network performs data reasoning on the next group of samples to be fused.
10. A sample fusion device, comprising:
the system comprises an acquisition unit, a fusion unit and a fusion unit, wherein the acquisition unit is used for acquiring a sample to be fused carrying label information of an original label; the label information of the original label comprises the label content and the label name of the original label;
the data reasoning unit is used for carrying out data reasoning on the sample to be fused through a pre-trained preset neural network to obtain a data reasoning result; the data reasoning result comprises label information of the prediction label of the sample to be fused; the label information of the predicted label comprises the label content and the label name of the predicted label;
and the fusion unit is used for carrying out sample fusion processing on the sample to be fused based on the label information of the prediction label of the sample to be fused and the label information of the original label in the data inference result.
CN202010464546.8A 2020-05-27 2020-05-27 Sample fusion method and device Pending CN111639698A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010464546.8A CN111639698A (en) 2020-05-27 2020-05-27 Sample fusion method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010464546.8A CN111639698A (en) 2020-05-27 2020-05-27 Sample fusion method and device

Publications (1)

Publication Number Publication Date
CN111639698A true CN111639698A (en) 2020-09-08

Family

ID=72329722

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010464546.8A Pending CN111639698A (en) 2020-05-27 2020-05-27 Sample fusion method and device

Country Status (1)

Country Link
CN (1) CN111639698A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108596266A (en) * 2018-05-02 2018-09-28 深圳市易成自动驾驶技术有限公司 Blending decision method, device based on semi-supervised learning and storage medium
CN109460821A (en) * 2018-10-29 2019-03-12 重庆中科云丛科技有限公司 A kind of neural network compression method, device, electronic equipment and storage medium
CN110163376A (en) * 2018-06-04 2019-08-23 腾讯科技(深圳)有限公司 Sample testing method, the recognition methods of media object, device, terminal and medium
CN111144498A (en) * 2019-12-26 2020-05-12 深圳集智数字科技有限公司 Image identification method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108596266A (en) * 2018-05-02 2018-09-28 深圳市易成自动驾驶技术有限公司 Blending decision method, device based on semi-supervised learning and storage medium
CN110163376A (en) * 2018-06-04 2019-08-23 腾讯科技(深圳)有限公司 Sample testing method, the recognition methods of media object, device, terminal and medium
CN109460821A (en) * 2018-10-29 2019-03-12 重庆中科云丛科技有限公司 A kind of neural network compression method, device, electronic equipment and storage medium
CN111144498A (en) * 2019-12-26 2020-05-12 深圳集智数字科技有限公司 Image identification method and device

Similar Documents

Publication Publication Date Title
CN110245716B (en) Sample labeling auditing method and device
CN110275958B (en) Website information identification method and device and electronic equipment
CN111611797B (en) Method, device and equipment for marking prediction data based on Albert model
CN112115372B (en) Parking lot recommendation method and device
CN113312899B (en) Text classification method and device and electronic equipment
CN112017777B (en) Method and device for predicting similar pair problem and electronic equipment
CN111310826A (en) Method and device for detecting labeling abnormity of sample set and electronic equipment
CN110490058B (en) Training method, device and system of pedestrian detection model and computer readable medium
CN114120071A (en) Detection method of image with object labeling frame
CN111859862B (en) Text data labeling method and device, storage medium and electronic device
CN111639698A (en) Sample fusion method and device
CN115878793B (en) Multi-label document classification method, device, electronic equipment and medium
CN111062374A (en) Identification method, device, system, equipment and readable medium of identity card information
CN114595332A (en) Text classification prediction method and device and electronic equipment
CN115294505A (en) Risk object detection and model training method and device and electronic equipment
CN113988067A (en) Sentence segmentation method and device and electronic equipment
CN110807118A (en) Image comment generation method and device and electronic equipment
CN113296785A (en) Document generation method, system, device and readable storage medium
CN117033239B (en) Control matching method and device, computer equipment and storage medium
CN114817526B (en) Text classification method and device, storage medium and terminal
CN115775265A (en) Model training method and device based on tracking quality evaluation and readable medium
CN114943974A (en) License plate recognition model training method and device, storage medium and electronic device
CN117011866A (en) Contract image segmentation method and device and electronic equipment
CN116977782A (en) Training method and related device for small sample detection model
CN114937272A (en) Recognition result detection method, device, equipment and medium based on character recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200908

RJ01 Rejection of invention patent application after publication