CN110910982A - Self-coding model training method, device, equipment and storage medium - Google Patents

Self-coding model training method, device, equipment and storage medium Download PDF

Info

Publication number
CN110910982A
CN110910982A CN201911065670.0A CN201911065670A CN110910982A CN 110910982 A CN110910982 A CN 110910982A CN 201911065670 A CN201911065670 A CN 201911065670A CN 110910982 A CN110910982 A CN 110910982A
Authority
CN
China
Prior art keywords
self
coding
detection result
model
coding model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911065670.0A
Other languages
Chinese (zh)
Inventor
陶然
刘怀学
李映华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Kingmed Diagnostics Central Co Ltd
Original Assignee
Guangzhou Kingmed Diagnostics Central Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Kingmed Diagnostics Central Co Ltd filed Critical Guangzhou Kingmed Diagnostics Central Co Ltd
Priority to CN201911065670.0A priority Critical patent/CN110910982A/en
Publication of CN110910982A publication Critical patent/CN110910982A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H15/00ICT specially adapted for medical reports, e.g. generation or transmission thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Primary Health Care (AREA)
  • Public Health (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a self-coding model training method, which comprises the following steps: acquiring a detection result in a report sheet, and precoding the detection result; adding noise to the detection result after precoding; inputting the detection result added with the noise into a preset self-coding model; wherein the self-encoding model comprises an encoder and a decoder; optimizing the self-encoding model with an optimizer; when the loss function of the optimized self-coding model is stable, removing the decoder in the self-coding model, and outputting the self-coding model with the decoder removed as a trained feature expression model. The invention also discloses a self-coding model training device, equipment and a computer storage medium. By adopting the embodiment of the invention, the trained feature expression model can promote the exploration of the report sheet detection result space, and solve the problems of incomplete coverage of feature variables, low efficiency and the like of the detection results of the artificially constructed medical examination report sheet.

Description

Self-coding model training method, device, equipment and storage medium
Technical Field
The invention relates to the field of data coding, in particular to a self-coding model training method, a self-coding model training device, self-coding model training equipment and a storage medium.
Background
Currently, result analysis corresponding to a medical detection report mainly analyzes result values of detection items in a certain type of report, and the detected result values are compared with statistical reference values to obtain a final report result. Most of the results of the report are documented through extensive testing and clinical performance during patient treatment, but there is still much research and mining space to examine the results of the report. At a certain specific time point, the examinees are detected by a plurality of detection methods, so that the accuracy of the detection result can be provided, the current state of the organism can be more comprehensively known, and more detailed physical data of the patient can be provided for clinical treatment. But as the number of test items and accumulated reports increases, the challenges become greater. The main reasons are that human biological status information is projected into a high-dimensional data space through detection results, it is increasingly difficult to analyze correlations between detection items and clinical manifestations through a conventional statistical method, the efficiency of feature detection on the detection items is low, and the coverage of feature variables is incomplete, so that the whole process of data analysis of the detection items is lengthy and expensive.
Disclosure of Invention
The embodiment of the invention aims to provide a self-coding model training method, a self-coding model training device and a self-coding model training storage medium, wherein the trained feature expression model can promote exploration on a report sheet detection result space and solve the problems of incomplete coverage of feature variables, low efficiency and the like of a detection result of a manually constructed medical examination report sheet.
In order to achieve the above object, an embodiment of the present invention provides a method for training a self-coding model, including:
acquiring a detection result in a report sheet, and precoding the detection result;
adding noise to the detection result after precoding;
inputting the detection result added with the noise into a preset self-coding model; wherein the self-encoding model comprises an encoder and a decoder;
optimizing the self-encoding model with an optimizer;
when the loss function of the optimized self-coding model is stable, removing the decoder in the self-coding model, and outputting the self-coding model with the decoder removed as a trained feature expression model.
Compared with the prior art, the self-coding model training method disclosed by the embodiment of the invention comprises the following steps of firstly, pre-coding a detection result, and adding noise to the detection result subjected to pre-coding; then, inputting the detection result added with the noise into a preset self-coding model, and optimizing the self-coding model by using an optimizer; and finally, when the loss function of the optimized self-coding model is stable, removing the decoder, and outputting the self-coding model after the decoder is removed as the trained feature expression model. The trained feature expression model in the embodiment of the invention can promote the exploration of the report form detection result space and solve the problems of incomplete coverage of feature variables, low efficiency and the like of the detection results of the artificially constructed medical examination report form.
As an improvement of the above scheme, the pre-coding the detection result specifically includes:
transversely arranging the detection results according to preset detection item codes; the detection result corresponding to the detection item which is not detected currently is empty, and the position of the detection result in the arrangement is reserved;
then, the self-coding model is Vanilla AutoEncoder or Sparse AutoEncoder.
As an improvement of the above scheme, the pre-coding the detection result specifically includes:
sequencing the detection results according to the time for generating the detection results;
then, the self-encoding model is LSTM AutoEncoder.
As an improvement of the above scheme, the pre-coding the detection result specifically includes:
arranging the detection results according to a preset arrangement rule; the preset arrangement rule is used for carrying out hierarchical division according to the category, department and/or subject of the detection item corresponding to the detection result;
then, the self-coding model is Convolition/Deconvolition AutoEncoder.
As an improvement of the above scheme, the pre-coding the detection result specifically includes:
sequencing the detection results according to a preset three-dimensional model; wherein the three-dimensional model comprises a plurality of slices representing different test packages, each of the slices comprising a plurality of the test results;
then, the self-coding model is Convolition/Deconvolition AutoEncoder.
As an improvement of the above scheme, the adding noise to the detection result after performing precoding specifically includes:
and adding random noise which obeys specific distribution to the detection result subjected to precoding.
The embodiment of the present invention further provides a self-coding model training apparatus, including:
the pre-coding module is used for acquiring the detection result in the report list and pre-coding the detection result;
a noise adding module, configured to add noise to the detection result after precoding;
the detection result input module is used for inputting the detection result added with the noise into a preset self-coding model; wherein the self-encoding model comprises an encoder and a decoder;
a self-coding model optimization module for optimizing the self-coding model using an optimizer;
and the characteristic expression module is used for removing the decoder in the self-coding model when the loss function of the optimized self-coding model is stable so as to output the self-coding model with the decoder removed as the trained characteristic expression model.
Compared with the prior art, the self-coding model training device disclosed by the embodiment of the invention comprises the following steps that firstly, a pre-coding module performs pre-coding on a detection result, and a noise adding module adds noise to the detection result after the pre-coding is performed; then, a detection result input module inputs the detection result added with the noise into a preset self-coding model, and a self-coding model optimization module optimizes the self-coding model by using an optimizer; and finally, when the loss function of the optimized self-coding model is stable, removing the decoder, and outputting the self-coding model after the decoder is removed as the trained feature expression model. The trained feature expression model in the embodiment of the invention can promote the exploration of the report form detection result space and solve the problems of incomplete coverage of feature variables, low efficiency and the like of the detection results of the artificially constructed medical examination report form.
As an improvement of the above scheme, the noise adding module is specifically configured to:
and adding random noise which obeys a specific distribution to the detection result in the report subjected to precoding.
To achieve the above object, an embodiment of the present invention further provides a self-coding model training apparatus, including a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, where the processor implements the self-coding model training method according to any one of the above embodiments when executing the computer program.
In order to achieve the above object, an embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium includes a stored computer program, where when the computer program runs, the apparatus on which the computer-readable storage medium is located is controlled to perform the self-coding model training method according to any one of the above embodiments.
Drawings
FIG. 1 is a flow chart of a method for training a self-coding model according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of matrix dimension coding provided by an embodiment of the present invention;
FIG. 3 is a schematic diagram of tensor dimension coding provided by an embodiment of the present invention;
FIG. 4 is a schematic diagram of adding noise to the detection result according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a self-encoder;
FIG. 6 is a schematic diagram of data input and output in the self-encoder;
FIG. 7 is a schematic structural diagram of a self-coding model training apparatus according to an embodiment of the present invention;
fig. 8 is a schematic structural diagram of a self-coding model training apparatus according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that, in the self-coding model training method according to the embodiment of the present invention, the detection items are items presented in a patient report. The report sheet is a detection report of a patient, and the report sheet can be an electronic version report sheet or an electronic version report sheet generated after a paper-based (doctor/patient-written) report sheet is automatically identified by a machine, so that information in the report sheet can be automatically extracted, and detection items in the report sheet can be further determined. It should be noted that, the process of identifying/extracting information from the report sheet may refer to a data processing process in the prior art, and the present invention is not limited thereto.
Illustratively, the data to be encoded mainly has two blocks, the first block is a field closely related to the measurement result dimension, i.e. a nominal variable, such as a unit of a measurement item, a reagent name used by the measurement item, a name of a measurement device used by the measurement item in the measurement process, a measurement method used in the measurement process, and the like, and information of the field having an influence on the measurement result numerical dimension needs to be retained by synchronous encoding so as not to cause information loss. The second block is a detection result corresponding to the detection item, the data of the block is main data and needs to be completely encoded, and in the embodiment of the invention, the trained self-encoding model is used for encoding the detection result.
Referring to fig. 1, fig. 1 is a flowchart of a self-coding model training method according to an embodiment of the present invention; the self-coding model training method comprises the following steps:
s1, obtaining the detection result in the report sheet, and pre-coding the detection result;
s2, adding noise to the detection result after precoding;
s3, inputting the detection result added with the noise into a preset self-coding model; wherein the self-encoding model comprises an encoder and a decoder;
s4, optimizing the self-coding model by utilizing an optimizer;
s5, when the loss function of the self-coding model after optimization is stable, removing the decoder in the self-coding model, and outputting the self-coding model with the decoder removed as the trained feature expression model.
Specifically, in step S1, the detection result is precoded in four coding modes, which are: vector dimension encoding, time dimension encoding, matrix dimension encoding, tensor dimension encoding.
The first scheme is as follows: vector dimension coding, namely transversely arranging the detection results according to a preset detection item code; the detection result corresponding to the detection item which is not detected currently is empty, and the position of the detection result in the arrangement is reserved; the codes of the detection items, namely the unique identifications of the detection items in a laboratory, are generally arranged in order, so that the writing and reading back of the program coding results are convenient. For example, there are 2000 items detected in total, 7 items detected this time, and the vector will have 2000 values, 7 being the result of value detection, and the others being special values (0) representing NULL values.
Scheme II: and time dimension coding, namely sequencing the detection results according to the time for generating the detection results. But the items without detection result need to be eliminated. For example, only 7 detection items are detected in 2000 detection items, and then only 7 detection results are detected in the vector and sorted according to the chronological order.
The third scheme is as follows: matrix dimension coding, namely arranging the detection results according to a preset arrangement rule; and the preset arrangement rule is used for carrying out hierarchical division according to the category, department and/or subject of the detection item corresponding to the detection result. Specifically, the detection results are arranged in a two-dimensional table manner. Because the results of the detection items have correlation, whether the arrangement of the detection items in the two-dimensional table is reasonable may hinder the neural network from extracting the relevant information, and the arrangement rule of the detection items needs to be specially designed. The detection items with obvious relevance are located at the spatial adjacent positions as much as possible.
As shown in fig. 2, in the embodiment of the present invention, the disciplines to which the detection items belong, the detected departments, the detection packages, the item combinations, and the item single-item hierarchies are divided, so that the detection result values are reasonably arranged in a two-dimensional space as a map. Because the result values need to be aligned into a matrix, certain byte bit number waste can be caused by the encoding, the method is suitable for convolutional neural network processing, and the parameter efficiency of a later-stage training model can be greatly improved. The specific arrangement rules are adjusted and optimized according to the knowledge of the laboratory door and the medical detection field. The coding method can better visualize the detection result, and can more intuitively see which subject the detection result belongs to and which detection package of which department, so that the subsequent visualization analysis is facilitated.
And the scheme is as follows: tensor dimension coding, namely sequencing the detection results according to a preset three-dimensional model; wherein the three-dimensional model is presented in the form of a three-dimensional table (tensor), said three-dimensional model comprising a number of slices (channels) representing different test packages, each of said slices comprising a number of said test results.
Each slice represents a collection of test item values, and the test items in a slice may be organized according to a third scheme. With the detection package as the dimension of the channel, i.e., with each slice representing one detection package, there are one hundred slices if there are a total of 100 packages. The number of elements in the slice (length x height of the cube) must be greater than or equal to the number of detection items in the largest package, and other rules are similar to the third scheme, and the layout of the detection results is shown in fig. 3. As shown in the figure, the number of the boxes of the section A is 7, the number of the B is 4, the number of the C is 12, the number of the boxes of the section is required to be more than or equal to 12, and if the number of the boxes of the section is less than 12, the C cannot be placed in the third section.
Specifically, in step S2, a Denoising AutoEncoder (Denoising AutoEncoder) method is adopted to add a noise value to the input data, and the current noise adding methods are mostly divided into two types: as shown in fig. 4, one is to add random Noise v obeying a certain distribution, such as gaussian distribution, to the detection result after precoding; another is to randomly set the input x to 0 in a certain ratio. The ratio herein refers to the ratio of 0 elements in the detection result value, such as [ 1.334,1.335,3.555,1.1 ] → [ 0,1.335,3.555,0 ], where the two values become 0, and the ratio is 2/4 ═ 50%, and the specific ratio value is an empirical parameter, and is optimized according to the size of the actual data and the experimental result.
Specifically, in step S3, the self-encoder algorithm is used as an advanced version of the PCA algorithm, is mainly used as a pre-training of parameters of the neural network in the early stage, and is gradually applied to data dimension reduction and characterization learning in the later stage, and has increasingly wide applications in the fields of image compression, information extraction, structure mining, abnormal value detection, and the like.
The self-encoder architecture is mainly composed of two parts, as shown in fig. 5, a left half part is composed of an encoder, a right half part is composed of a decoder, and the middle part is a self-encoding result, or hidden variable. The whole network is constructed through a neural network, the width of the middle-level network is gradually narrowed, then the width of the middle-level network is gradually widened, and finally variable information in the data is extracted through similar information bottleneck effect. The encoder compresses the input code and projects it into the latent variable space, which can be represented by the function h ═ f (x), the decoder projects the value h of the latent variable back into the sample space after decoding, which can be represented by the function r ═ g (h), the whole calculation process can be represented by g (f), (x) ═ x ', as shown in fig. 6, the algorithm requires that the input value x and the output value x' are closer to each other, which is better.
The model adopts Mean Square Error (MSE) to reconstruct a loss function, and the loss function to be optimized is as follows:
L(x,x')=||x-x'||2=||x-σ'(W'(σ(Wx+b))+b')||2
wherein L represents a loss function; x represents input data, namely the result of the detection item in the result list; x' represents a model output value, namely a reconstructed detection item result; w, b represents the belt learning parameters in the encoder; w ', b' represents parameters learned by the decoder; σ and σ' represent sigmoid functions for nonlinear transformation.
At present, the self-encoder mainly has a plurality of evolution versions, and the self-encoder is adopted in the method one by one. The method comprises the following steps: original version, Vanilla AutoEncoder; a thinned version, Sparse AutoEncoder; a multi-layer convolutional network version, Convolation/Deconvolation AutoEncoder; version of recurrent neural networks, LSTM AutoEncoder. It should be noted that the above evolution versions are all existing models in the prior art, the structure and the working engineering of a specific model may refer to the prior art, and are not described herein again, and the purpose of using the above evolution versions in the embodiment of the present invention is to select a suitable coding model for different schemes (1 to 4). In addition, the loss functions of the above-mentioned loss functions are applicable to the other versions except for the sparse version, and the loss functions of the sparse version can refer to the loss functions of the sparse self-encoder in the prior art, which is not described herein again.
Illustratively, when the precoding manner of scheme one is adopted in step S1, the self-coding model adopted at this time is Vanilla AutoEncoder or Sparse AutoEncoder; when the pre-coding mode of scheme two is adopted in step S1, the self-coding model adopted at this time is LSTM AutoEncoder; when the precoding schemes of scheme three and scheme four are adopted in step S1, the self-coding model adopted at this time is a contribution/deconvolution eutoencoder.
Specifically, in step S4, the model is optimized by using an optimizer, such as a conventional random gradient descent algorithm SGD.
Specifically, in step S5, when the loss function of the optimized self-coding model is stable, the decoder in the self-coding model is removed, and only the self-coding model including the encoder needs to be retained because only the detection result needs to be encoded in the scheme, and the self-coding model removed from the decoder is output as the trained feature expression model.
And for a new report sheet, directly inputting the detection result into the trained feature expression model to obtain feature representation data of the report sheet, namely h ═ f (x). And the feature of the detection result indicates that other coding fields of the data combined report sheet are used for detecting abnormal values of the report sheet, clustering the report sheet, analyzing relevance of the report sheet diseases and the like.
Compared with the prior art, the self-coding model training method disclosed by the embodiment of the invention comprises the following steps of firstly, pre-coding a detection result, and adding noise to the detection result subjected to pre-coding; then, inputting the detection result added with the noise into a preset self-coding model, and optimizing the self-coding model by using an optimizer; and finally, when the loss function of the optimized self-coding model is stable, removing the decoder, and outputting the self-coding model after the decoder is removed as the trained feature expression model. The trained feature expression model in the embodiment of the invention can promote the exploration of the report form detection result space and solve the problems of incomplete coverage of feature variables, low efficiency and the like of the detection results of the artificially constructed medical examination report form.
Referring to fig. 7, fig. 7 is a schematic structural diagram of a self-coding model training apparatus 10 according to an embodiment of the present invention; the self-encoding model training apparatus 10 includes:
a pre-coding module 11, configured to obtain a detection result in the report, and pre-code the detection result;
a noise adding module 12, configured to add noise to the detection result after precoding;
a detection result input module 13, configured to input the detection result after adding noise into a preset self-coding model; wherein the self-encoding model comprises an encoder and a decoder;
a self-coding model optimization module 14 for optimizing the self-coding model with an optimizer;
and the feature expression module 15 is configured to remove the decoder in the self-coding model when the loss function of the optimized self-coding model is stable, so as to output the self-coding model with the decoder removed as a trained feature expression model.
Preferably, the noise adding module 12 is specifically configured to: and adding random noise which obeys specific distribution to the detection result subjected to precoding.
Preferably, there are four ways for precoding the detection result in the precoding module 11, which are respectively: vector dimension encoding, time dimension encoding, matrix dimension encoding, tensor dimension encoding.
The first scheme is as follows: transversely arranging the detection results according to preset detection item codes; the detection result corresponding to the detection item which is not detected currently is empty, and the position of the detection result in the arrangement is reserved; the codes of the detection items, namely the unique identifications of the detection items in a laboratory, are generally arranged in order, so that the writing and reading back of the program coding results are convenient.
Scheme II: sequencing the detection results according to the time for generating the detection results; but the items without detection result need to be eliminated.
The third scheme is as follows: arranging the detection results according to a preset arrangement rule; the preset arrangement rule is used for carrying out hierarchical division according to the category, department and/or subject of the detection item corresponding to the detection result; specifically, the detection results are arranged in a two-dimensional table manner. Because the results of the detection items have correlation, whether the arrangement of the detection items in the two-dimensional table is reasonable may hinder the neural network from extracting the relevant information, and the arrangement rule of the detection items needs to be specially designed. The detection items with obvious relevance are located at the spatial adjacent positions as much as possible.
And the scheme is as follows: sequencing the detection results according to a preset three-dimensional model; wherein the three-dimensional model is presented in the form of a three-dimensional table (tensor), and the three-dimensional model comprises a plurality of slices representing different detection packages, and each slice comprises a plurality of detection results.
It should be noted that, for the working processes of the specific schemes one to four, please refer to the precoding processing process in the self-coding model training method in the foregoing embodiment, which is not described herein again. For the specific working process of each module in the self-coding model training device 10, please refer to the process in the self-coding model training method described in the above embodiment, which is not described herein again.
Specifically, the noise adding module 12 adds a noise value to the input data by using a Denoising AutoEncoder (noise reduction AutoEncoder), and most of the current noise adding methods are two types: one is to add random Noise v obeying a specific distribution, such as a gaussian distribution, to the detection result subjected to precoding; another is to randomly set the input x to 0 in a certain ratio. The ratio herein refers to the ratio of 0 elements in the detection result value, such as [ 1.334,1.335,3.555,1.1 ] → [ 0,1.335,3.555,0 ], where the two values become 0, and the ratio is 2/4 ═ 50%, and the specific ratio value is an empirical parameter, and is optimized according to the size of the actual data and the experimental result.
Specifically, the self-encoder algorithm is used as an advanced version of the PCA algorithm, is mainly used for pre-training parameters of a neural network in the early stage, is gradually applied to data dimension reduction and characterization learning in the later stage, and has increasingly wide application in the fields of image compression, information extraction, structure mining, abnormal value detection and the like.
The self-encoder architecture is mainly composed of two parts, as shown in fig. 5, a left half part is composed of an encoder, a right half part is composed of a decoder, and the middle part is a self-encoding result, or hidden variable. The whole network is constructed through a neural network, the width of the middle-level network is gradually narrowed, then the width of the middle-level network is gradually widened, and finally variable information in the data is extracted through similar information bottleneck effect. The encoder compresses the input code and projects it into the latent variable space, which can be represented by the function h ═ f (x), the decoder projects the value h of the latent variable back into the sample space after decoding, which can be represented by the function r ═ g (h), the whole calculation process can be represented by g (f), (x) ═ x ', as shown in fig. 6, the algorithm requires that the input value x and the output value x' are closer to each other, which is better.
The model adopts Mean Square Error (MSE) to reconstruct a loss function, and the loss function to be optimized is as follows:
L(x,x')=||x-x'||2=||x-σ'(W'(σ(Wx+b))+b')||2
wherein L represents a loss function; x represents input data, namely the result of the detection item in the result list; x' represents a model output value, namely a reconstructed detection item result; w, b represents the belt learning parameters in the encoder; w ', b' represents parameters learned by the decoder; σ and σ' represent sigmoid functions for nonlinear transformation.
At present, the self-encoder mainly has several evolution versions, and the self-encoder is adopted one by one in the embodiment of the invention. The method comprises the following steps: original version, Vanilla AutoEncoder; a thinned version, Sparse AutoEncoder; a multi-layer convolutional network version, Convolation/Deconvolation AutoEncoder; version of recurrent neural networks, LSTM AutoEncoder. It should be noted that the above evolution versions are all existing models in the prior art, the structure and the working engineering of a specific model may refer to the prior art, and are not described herein again, and the purpose of using the above evolution versions in the embodiment of the present invention is to select a suitable coding model for different schemes (1 to 4). In addition, the loss functions of the above-mentioned loss functions are applicable to the other versions except for the sparse version, and the loss functions of the sparse version can refer to the loss functions of the sparse self-encoder in the prior art, which is not described herein again.
Illustratively, when the pre-coding module 11 adopts the pre-coding mode of scheme one, the self-coding model adopted at this time is Vanilla AutoEncoder or Sparse AutoEncoder; when the pre-coding module 11 adopts the pre-coding mode of scheme two, the self-coding model adopted at this time is an LSTM AutoEncoder; when the pre-coding module 11 adopts the pre-coding modes of the third scheme and the fourth scheme, the self-coding model adopted at this time is a constraint/deconstruction AutoEncoder.
Specifically, the model is optimized by using an optimizer, for example, a common stochastic gradient descent algorithm SGD is used for optimization solution.
Specifically, when the loss function of the optimized self-coding model is stable, the feature expression module 15 removes the decoder in the self-coding model, and only the self-coding model including the encoder needs to be retained because only the detection result needs to be encoded in the scheme, and the self-coding model from which the decoder is removed is output as the trained feature expression model.
And for a new report sheet, directly inputting the detection result into the trained feature expression model to obtain feature representation data of the report sheet, namely h ═ f (x). And the feature of the detection result indicates that other coding fields of the data combined report sheet are used for detecting abnormal values of the report sheet, clustering the report sheet, analyzing relevance of the report sheet diseases and the like.
Compared with the prior art, the self-coding model training device 10 disclosed by the embodiment of the invention comprises the following steps that firstly, a pre-coding module 11 performs pre-coding on a detection result, and a noise adding module 12 adds noise to the detection result after the pre-coding is performed; then, the detection result input module 13 inputs the detection result after adding noise into a preset self-coding model, and the self-coding model optimization module 14 optimizes the self-coding model by using an optimizer; finally, when the loss function of the optimized self-coding model is stable, the feature expression module 15 removes the decoder and outputs the self-coding model with the decoder removed to form a feature expression model for training. The trained feature expression model in the embodiment of the invention can promote the exploration of the report form detection result space and solve the problems of incomplete coverage of feature variables, low efficiency and the like of the detection results of the artificially constructed medical examination report form.
Referring to fig. 8, fig. 8 is a schematic structural diagram of a self-coding model training apparatus 20 according to an embodiment of the present invention; the self-encoding model training apparatus 20 of this embodiment includes: a processor 21, a memory 22 and a computer program stored in said memory 22 and executable on said processor 21. The processor 21, when executing the computer program, implements the steps in the above-mentioned embodiments of the self-coding model training method, such as step S1 shown in fig. 1. Alternatively, the processor 21, when executing the computer program, implements the functions of the modules/units in the above-mentioned device embodiments, such as the pre-coding module 11.
Illustratively, the computer program may be divided into one or more modules/units, which are stored in the memory 22 and executed by the processor 21 to accomplish the present invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used for describing the execution process of the computer program in the self-encoding model training device 20. For example, the computer program may be divided into a pre-coding module 11, a noise adding module 12, a detection result input module 13, a self-coding model optimizing module 14, and a feature expression module 15, and specific functions of each module refer to the working process of the detection item encoding apparatus 10 described in the foregoing embodiment, which is not described herein again.
The self-coding model training device 20 may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The self-coding model training device 20 may include, but is not limited to, a processor 21 and a memory 22. It will be understood by those skilled in the art that the schematic diagram is merely an example of the self-coding model training device 20, does not constitute a limitation of the self-coding model training device 20, and may include more or less components than those shown, or combine some components, or different components, for example, the self-coding model training device 20 may further include an input-output device, a network access device, a bus, etc.
The Processor 21 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like, and the processor 21 is a control center of the self-encoding model training apparatus 20, and various interfaces and lines are used to connect various parts of the whole self-encoding model training apparatus 20.
The memory 22 may be used for storing the computer programs and/or modules, and the processor 21 implements various functions of the self-coding model training apparatus 20 by running or executing the computer programs and/or modules stored in the memory 22 and calling data stored in the memory 22. The memory 22 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. In addition, the memory 22 may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.
Wherein, the modules/units integrated by the self-coding model training device 20 can be stored in a computer readable storage medium if they are implemented in the form of software functional units and sold or used as independent products. Based on such understanding, all or part of the flow of the method according to the above embodiments may be implemented by a computer program, which may be stored in a computer readable storage medium and used by the processor 21 to implement the steps of the above embodiments of the method. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, etc. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.
It should be noted that the above-described device embodiments are merely illustrative, where the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. In addition, in the drawings of the embodiment of the apparatus provided by the present invention, the connection relationship between the modules indicates that there is a communication connection between them, and may be specifically implemented as one or more communication buses or signal lines. One of ordinary skill in the art can understand and implement it without inventive effort.
While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention.

Claims (10)

1. A method for training a self-coding model, comprising:
acquiring a detection result in a report sheet, and precoding the detection result;
adding noise to the detection result after precoding;
inputting the detection result added with the noise into a preset self-coding model; wherein the self-encoding model comprises an encoder and a decoder;
optimizing the self-encoding model with an optimizer;
when the loss function of the optimized self-coding model is stable, removing the decoder in the self-coding model, and outputting the self-coding model with the decoder removed as a trained feature expression model.
2. The method for training a self-coding model according to claim 1, wherein the pre-coding the detection result specifically comprises:
transversely arranging the detection results according to preset detection item codes; the detection result corresponding to the detection item which is not detected currently is empty, and the position of the detection result in the arrangement is reserved;
then, the self-coding model is Vani l la AutoEncoder or spark AutoEncoder.
3. The method for training a self-coding model according to claim 1, wherein the pre-coding the detection result specifically comprises:
sequencing the detection results according to the time for generating the detection results;
then, the self-encoding model is LSTM AutoEncoder.
4. The method for training a self-coding model according to claim 1, wherein the pre-coding the detection result specifically comprises:
arranging the detection results according to a preset arrangement rule; the preset arrangement rule is used for carrying out hierarchical division according to the category, department and/or subject of the detection item corresponding to the detection result;
then, the self-coding model is Convolition/Deconvolition AutoEncoder.
5. The method for training a self-coding model according to claim 1, wherein the pre-coding the detection result specifically comprises:
sequencing the detection results according to a preset three-dimensional model; wherein the three-dimensional model comprises a plurality of slices representing different test packages, each of the slices comprising a plurality of the test results;
then, the self-coding model is Convolition/Deconvolition AutoEncoder.
6. The method for training a self-coding model according to claim 1, wherein the adding noise to the detection result after performing pre-coding specifically comprises:
and adding random noise which obeys specific distribution to the detection result subjected to precoding.
7. A self-coding model training apparatus, comprising:
the pre-coding module is used for acquiring the detection result in the report list and pre-coding the detection result;
a noise adding module, configured to add noise to the detection result after precoding;
the detection result input module is used for inputting the detection result added with the noise into a preset self-coding model; wherein the self-encoding model comprises an encoder and a decoder;
a self-coding model optimization module for optimizing the self-coding model using an optimizer;
and the characteristic expression module is used for removing the decoder in the self-coding model when the loss function of the optimized self-coding model is stable so as to output the self-coding model with the decoder removed as the trained characteristic expression model.
8. The self-coding model training device of claim 7, wherein the noise addition module is specifically configured to:
and adding random noise which obeys specific distribution to the detection result subjected to precoding.
9. A self-coding model training device comprising a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, the processor implementing the self-coding model training method of any one of claims 1 to 6 when executing the computer program.
10. A computer-readable storage medium, comprising a stored computer program, wherein the computer program, when executed, controls an apparatus in which the computer-readable storage medium is located to perform the self-coding model training method according to any one of claims 1 to 6.
CN201911065670.0A 2019-11-04 2019-11-04 Self-coding model training method, device, equipment and storage medium Pending CN110910982A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911065670.0A CN110910982A (en) 2019-11-04 2019-11-04 Self-coding model training method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911065670.0A CN110910982A (en) 2019-11-04 2019-11-04 Self-coding model training method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN110910982A true CN110910982A (en) 2020-03-24

Family

ID=69815877

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911065670.0A Pending CN110910982A (en) 2019-11-04 2019-11-04 Self-coding model training method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110910982A (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111489802A (en) * 2020-03-31 2020-08-04 重庆金域医学检验所有限公司 Report coding model generation method, system, device and storage medium
CN111489803A (en) * 2020-03-31 2020-08-04 重庆金域医学检验所有限公司 Report coding model generation method, system and equipment based on autoregressive model
CN111599431A (en) * 2020-03-31 2020-08-28 太原金域临床检验有限公司 Report sheet-based data coding model generation method, system and equipment
CN111613287A (en) * 2020-03-31 2020-09-01 武汉金域医学检验所有限公司 Report coding model generation method, system and equipment based on Glow network
CN112036513A (en) * 2020-11-04 2020-12-04 成都考拉悠然科技有限公司 Image anomaly detection method based on memory-enhanced potential spatial autoregression
CN112598328A (en) * 2021-01-05 2021-04-02 中国人民解放军国防科技大学 Optimization method and system for multi-target distribution of transfer boarding gates in satellite hall mode
CN112635001A (en) * 2020-12-21 2021-04-09 山东众阳健康科技集团有限公司 ICD (interface control document) encoded data processing method, system, storage medium and equipment
CN112734669A (en) * 2021-01-07 2021-04-30 苏州浪潮智能科技有限公司 Training method of anomaly detection model based on improved noise reduction self-encoder
CN113779236A (en) * 2021-08-11 2021-12-10 齐维维 Method and device for problem classification based on artificial intelligence
CN115185805A (en) * 2022-09-13 2022-10-14 浪潮电子信息产业股份有限公司 Performance prediction method, system, equipment and storage medium of storage system
CN115250199A (en) * 2022-07-15 2022-10-28 北京六方云信息技术有限公司 Data stream detection method and device, terminal equipment and storage medium
CN115293663A (en) * 2022-10-10 2022-11-04 国网山东省电力公司滨州供电公司 Bus unbalance rate abnormity detection method, system and device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110009013A (en) * 2019-03-21 2019-07-12 腾讯科技(深圳)有限公司 Encoder training and characterization information extracting method and device
CN110059357A (en) * 2019-03-19 2019-07-26 中国电力科学研究院有限公司 A kind of intelligent electric energy meter failure modes detection method and system based on autoencoder network
WO2019185987A1 (en) * 2018-03-29 2019-10-03 Nokia Technologies Oy Entropy-friendly neural network representations and their use in training and using neural networks such as autoencoders

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019185987A1 (en) * 2018-03-29 2019-10-03 Nokia Technologies Oy Entropy-friendly neural network representations and their use in training and using neural networks such as autoencoders
CN110059357A (en) * 2019-03-19 2019-07-26 中国电力科学研究院有限公司 A kind of intelligent electric energy meter failure modes detection method and system based on autoencoder network
CN110009013A (en) * 2019-03-21 2019-07-12 腾讯科技(深圳)有限公司 Encoder training and characterization information extracting method and device

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111489802A (en) * 2020-03-31 2020-08-04 重庆金域医学检验所有限公司 Report coding model generation method, system, device and storage medium
CN111489803A (en) * 2020-03-31 2020-08-04 重庆金域医学检验所有限公司 Report coding model generation method, system and equipment based on autoregressive model
CN111599431A (en) * 2020-03-31 2020-08-28 太原金域临床检验有限公司 Report sheet-based data coding model generation method, system and equipment
CN111613287A (en) * 2020-03-31 2020-09-01 武汉金域医学检验所有限公司 Report coding model generation method, system and equipment based on Glow network
CN112036513A (en) * 2020-11-04 2020-12-04 成都考拉悠然科技有限公司 Image anomaly detection method based on memory-enhanced potential spatial autoregression
CN112635001A (en) * 2020-12-21 2021-04-09 山东众阳健康科技集团有限公司 ICD (interface control document) encoded data processing method, system, storage medium and equipment
CN112635001B (en) * 2020-12-21 2023-04-07 山东众阳健康科技集团有限公司 ICD (interface control document) encoded data processing method, system, storage medium and equipment
CN112598328A (en) * 2021-01-05 2021-04-02 中国人民解放军国防科技大学 Optimization method and system for multi-target distribution of transfer boarding gates in satellite hall mode
CN112734669A (en) * 2021-01-07 2021-04-30 苏州浪潮智能科技有限公司 Training method of anomaly detection model based on improved noise reduction self-encoder
CN112734669B (en) * 2021-01-07 2022-12-02 苏州浪潮智能科技有限公司 Training method of anomaly detection model based on improved noise reduction self-encoder
CN113779236A (en) * 2021-08-11 2021-12-10 齐维维 Method and device for problem classification based on artificial intelligence
CN115250199A (en) * 2022-07-15 2022-10-28 北京六方云信息技术有限公司 Data stream detection method and device, terminal equipment and storage medium
CN115185805A (en) * 2022-09-13 2022-10-14 浪潮电子信息产业股份有限公司 Performance prediction method, system, equipment and storage medium of storage system
CN115185805B (en) * 2022-09-13 2023-01-24 浪潮电子信息产业股份有限公司 Performance prediction method, system, equipment and storage medium of storage system
CN115293663A (en) * 2022-10-10 2022-11-04 国网山东省电力公司滨州供电公司 Bus unbalance rate abnormity detection method, system and device

Similar Documents

Publication Publication Date Title
CN110910982A (en) Self-coding model training method, device, equipment and storage medium
Pezzotti et al. Deepeyes: Progressive visual analytics for designing deep neural networks
Beckett et al. FALCON: a software package for analysis of nestedness in bipartite networks
CN111710364B (en) Method, device, terminal and storage medium for acquiring flora marker
CN111931809A (en) Data processing method and device, storage medium and electronic equipment
CN112257578A (en) Face key point detection method and device, electronic equipment and storage medium
CN110647995A (en) Rule training method, device, equipment and storage medium
CN111338897A (en) Identification method of abnormal node in application host, monitoring equipment and electronic equipment
CN116361801A (en) Malicious software detection method and system based on semantic information of application program interface
CN108304322B (en) Pressure testing method and terminal equipment
CN112527676A (en) Model automation test method, device and storage medium
Vieira et al. A step-by-step tutorial on how to build a machine learning model
CN115730947A (en) Bank customer loss prediction method and device
CN111178701A (en) Risk control method and device based on feature derivation technology and electronic equipment
CN110970100A (en) Method, device and equipment for detecting item coding and computer readable storage medium
JP2023139296A (en) Signal processing method, signal processing apparatus, and signal processing program
CN113824580A (en) Network index early warning method and system
CN104331507B (en) Machine data classification is found automatically and the method and device of classification
CN110544166A (en) Sample generation method, device and storage medium
CN113962335B (en) Flexibly configurable data whole-process processing method
CN115686995A (en) Data monitoring processing method and device
Cecil et al. On convolutional neural networks for selection inference: revealing the lurking role of preprocessing, and the surprising effectiveness of summary statistics
CN110472292B (en) Industrial equipment data simulation configuration system and method
CN108108371A (en) A kind of file classification method and device
CN116451087B (en) Character matching method, device, terminal and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200324