CN107845408B - Data evaluation method and device, storage medium and electronic device - Google Patents

Data evaluation method and device, storage medium and electronic device Download PDF

Info

Publication number
CN107845408B
CN107845408B CN201711006245.5A CN201711006245A CN107845408B CN 107845408 B CN107845408 B CN 107845408B CN 201711006245 A CN201711006245 A CN 201711006245A CN 107845408 B CN107845408 B CN 107845408B
Authority
CN
China
Prior art keywords
data
key information
data sample
result
scoring
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711006245.5A
Other languages
Chinese (zh)
Other versions
CN107845408A (en
Inventor
杜硕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yidu Cloud Beijing Technology Co Ltd
Original Assignee
Yidu Cloud Beijing Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yidu Cloud Beijing Technology Co Ltd filed Critical Yidu Cloud Beijing Technology Co Ltd
Priority to CN201711006245.5A priority Critical patent/CN107845408B/en
Publication of CN107845408A publication Critical patent/CN107845408A/en
Application granted granted Critical
Publication of CN107845408B publication Critical patent/CN107845408B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Fuzzy Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data evaluation method and device, a storage medium and electronic equipment, and relates to the technical field of data processing. The data evaluation method comprises the following steps: determining key information in the data; acquiring a preset number of data samples and providing the data samples for a grader to grade the data samples so as to obtain a grading result of each data sample; judging whether the key information exists in each data sample or not and generating a key information vector aiming at each data sample according to a judgment result; training the scoring results and the key information vectors through a regression algorithm to generate a training model; and acquiring data to be evaluated and operating the training model aiming at the data to be evaluated to obtain a grading result of the data to be evaluated as an evaluation result. According to the data evaluation method and device, the manual participation process can be reduced, and meanwhile, the stability and efficiency of data evaluation are improved.

Description

Data evaluation method and device, storage medium and electronic device
Technical Field
The present disclosure relates to the field of data processing technologies, and in particular, to a data evaluation method, a data evaluation device, a storage medium, and an electronic device.
Background
With the development of the information-based society, big data ideas have been infiltrated into various industries. In the field of big data, how to sort out data with complete information and pertinence becomes a key of big data analysis, and in the process, the data needs to be evaluated to determine whether the data can better meet the analysis requirements.
Taking medical record data in the medical industry as an example, with the increase of patient data in hospitals, in order to better utilize historical treatment experience, the induction, summarization and arrangement of the medical record data become more and more important. However, at present, the medical record data arrangement work of hospitals is mainly performed by manual arrangement and classification by related personnel (for example, personnel in medical record rooms), and on one hand, the data arrangement needs continuous human input and has low efficiency; on the other hand, it is difficult for the relevant person to evaluate medical record data from a professional perspective for various diseases due to limitations in their professional fields.
In view of the above, a data evaluation method, a data evaluation apparatus, a storage medium, and an electronic device are required.
It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.
Disclosure of Invention
An object of the present disclosure is to provide a data evaluation method, a data evaluation apparatus, a storage medium, and an electronic device, thereby overcoming, at least to some extent, one or more problems due to limitations and disadvantages of the related art.
According to an aspect of the present disclosure, there is provided a data evaluation method including:
determining key information in the data;
acquiring a preset number of data samples and providing the data samples for a grader to grade the data samples so as to obtain a grading result of each data sample;
judging whether the key information exists in each data sample or not and generating a key information vector aiming at each data sample according to a judgment result;
training the scoring results and the key information vectors through a regression algorithm to generate a training model; and
and acquiring data to be evaluated, and operating the training model aiming at the data to be evaluated to obtain a grading result of the data to be evaluated as an evaluation result.
In an exemplary embodiment of the present disclosure, training the scoring results and the key information vectors through a regression algorithm includes:
performing fitting analysis on the scoring result and the key information vector of each data sample through a regression algorithm to obtain a score of each data sample;
and determining a score threshold value for grading scores according to the scoring result of each data sample and the score of each data sample.
In an exemplary embodiment of the present disclosure, the key information includes first key information and/or second key information;
the first key information comprises information based on data integrity, and the second key information comprises index information corresponding to the evaluation dimension.
In an exemplary embodiment of the present disclosure, determining key information in the data includes:
determining the first key information according to a preset data specification; and/or
And determining index information designated by the scoring personnel as the second key information.
According to an aspect of the present disclosure, there is provided a data evaluation apparatus including:
the key information determining module is used for determining key information in the data;
the data sample scoring module is used for acquiring a preset number of data samples and providing the data samples to a scoring person so that the scoring person can score the data samples to obtain a scoring result of the data samples;
the key information vector generating module is used for judging whether the key information exists in each data sample and generating a key information vector aiming at each data sample according to a judgment result;
the training model generation module is used for training the scoring result and the key information vector through a regression algorithm to generate a training model; and
and the data evaluation module is used for acquiring data to be evaluated and operating the training model aiming at the data to be evaluated so as to obtain a grading result of the data to be evaluated as an evaluation result.
In an exemplary embodiment of the present disclosure, the training model generation module includes:
the score obtaining unit is used for performing fitting analysis on the scoring result and the key information vector of each data sample through a regression algorithm to obtain the score of each data sample;
and the score threshold determining unit is used for determining a score threshold for grading scores according to the scoring result of each data sample and the score of each data sample.
In an exemplary embodiment of the present disclosure, the key information includes first key information and/or second key information;
the first key information comprises information based on data integrity, and the second key information comprises index information corresponding to the evaluation dimension.
In an exemplary embodiment of the present disclosure, the key information determining module includes:
the first key information determining unit is used for determining the first key information according to a preset data specification; and/or
And the second key information determining unit is used for determining the index information designated by the scoring personnel as the second key information.
According to an aspect of the present disclosure, there is provided a storage medium having stored thereon a computer program which, when executed by a processor, implements the data evaluation method of any one of the above.
According to an aspect of the present disclosure, there is provided an electronic device including:
a processor; and
a memory for storing executable instructions of the processor;
wherein the processor is configured to perform any of the data evaluation methods described above via execution of the executable instructions.
According to the technical scheme provided by some embodiments of the disclosure, the scoring result of a data sample is obtained, the key information vector of the data sample is generated, the scoring result and the key information vector of the data sample are trained through a regression algorithm, and the data to be evaluated is evaluated according to the generated training model, on one hand, after a scoring person scores the data sample manually, the process of manual participation is avoided, and the problem that the data evaluation needs to be carried out manually and continuously in the prior art is solved; on the other hand, the process of human participation is reduced, so that the problem of unstable data evaluation caused by subjective factors of a marker can be reduced; on the other hand, due to the fact that human participation is reduced, the data evaluation efficiency can be improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without the exercise of inventive faculty. In the drawings:
FIG. 1 schematically illustrates a flow chart of a data evaluation method according to an exemplary embodiment of the present disclosure;
FIG. 2 schematically illustrates a flowchart of an overall process of one embodiment of a data evaluation method according to an exemplary embodiment of the present disclosure;
FIG. 3 schematically illustrates a block diagram of a data evaluation device according to an exemplary embodiment of the present disclosure;
FIG. 4 schematically illustrates a block diagram of a training model generation module according to an exemplary embodiment of the present disclosure;
FIG. 5 schematically illustrates a block diagram of a key information determination module according to an exemplary embodiment of the present disclosure;
FIG. 6 shows a schematic diagram of a storage medium according to an example embodiment of the present disclosure; and
fig. 7 schematically shows a block diagram of an electronic device according to an exemplary embodiment of the present disclosure.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and the like. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the present disclosure.
Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.
The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the steps. For example, some steps may be decomposed, and some steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.
The data evaluation method of the present disclosure will be described below by taking medical record data in the medical industry as an example. However, without being limited thereto, it should be understood that the data evaluation methods described in the present disclosure may also be applied to other fields, such as data evaluation of data in questionnaires, data evaluation of data in business reports, and the like.
Fig. 1 schematically shows a flow chart of a data evaluation method of an exemplary embodiment of the present disclosure. Referring to fig. 1, the data evaluation method may include the steps of:
s10, determining key information in the data;
s12, obtaining a preset number of data samples and providing the data samples for a grader to grade the data samples so as to obtain a grading result of the data samples;
s14, judging whether the key information exists in each data sample or not and generating a key information vector aiming at each data sample according to a judgment result;
s16, training the scoring result and the key information vector through a regression algorithm to generate a training model; and
and S18, acquiring data to be evaluated, and operating the training model aiming at the data to be evaluated to obtain a grading result of the data to be evaluated as an evaluation result.
According to the data evaluation method, the scoring result of the data sample is obtained, the key information vector of the data sample is generated, the scoring result and the key information vector of the data sample are trained through a regression algorithm, and the data to be evaluated is evaluated according to the generated training model; on the other hand, the process of human participation is reduced, so that the problem of unstable data evaluation caused by subjective factors of a marker can be reduced; on the other hand, due to the fact that human participation is reduced, the data evaluation efficiency can be improved.
The data evaluation method of the exemplary embodiment of the present disclosure will be described in detail below.
In step S10, key information in the data is determined.
In an exemplary embodiment of the present disclosure, the key information may include first key information and/or second key information, and specifically, the first key information may include information based on data integrity, and the second key information may include index information corresponding to the evaluation dimension. The evaluation dimension described herein may mean an angle at which a user desires to evaluate, for example, the direction of evaluation may be for a certain item contained in the data.
Determining key information in the data may include determining first key information according to a preset data specification and/or determining index information designated by a grader as second key information.
Taking medical record data in the medical industry as an example, the first key information may include information determined according to, for example, basic medical record writing specifications, for example, a developer may enter basic medical record writing specifications included in the basic medical record writing specifications into a storage unit of the system in advance, and the server may extract the basic medical record information from the storage unit. Such basic specification information may include, but is not limited to, patient name, patient gender, patient age, visit time, symptom description, treatment regimen, doctor signature, etc., among others. In addition, the scoring personnel and/or the developers can define the data information that should be included in the medical record according to the data integrity requirement, which is not particularly limited in the exemplary embodiment.
The second key information may be index information specified by a grader according to a clinical route, a medical guideline, or the like, and the index information may include information for treatment of a specific disease. Taking the case of a medical record of hypertension as an example for data evaluation, the second key information may include index information related to hypertension, such as systolic pressure, diastolic pressure, and the like.
It is easily understood that in the case of determining only the integrity of medical record data, only the first key information may be determined, and in the case of medical record data for a certain disease, in addition to the first key information, the second key information corresponding to the disease may also need to be determined.
In addition, in order to ensure the accuracy of the determined key information, the scoring personnel described in the present disclosure may include medical experts in various disease areas, physicians with abundant clinical experience, and the like. In addition, the medical records described above may be in-patient medical records or out-patient medical records, and the specific form of the medical records is not particularly limited in this disclosure.
In step S12, a preset number of data samples are obtained and provided to a grader, so that the grader grades each data sample to obtain a grading result of each data sample.
In an exemplary embodiment of the present disclosure, the scorer may generate different scoring results based on different scoring methods. In particular, in one aspect, the data samples may be graded, that is, the data samples may be divided into several levels, for example, the data samples may be graded from 1 to 5, wherein grade 1 is the highest grade and grade 5 is the lowest grade. However, the 5 th gear may be set to have the highest score and the 1 st gear may be set to have the lowest score. Additionally, gears 1-5, e.g., A, B, C, D, E, may be characterized in terms of, for example, letters. It is readily understood that it is also within the contemplation of the present disclosure to divide the data samples into other than 5-gears, e.g., the scorer may divide the data samples into 3-gears, 10-gears, etc.; on the other hand, the data samples may be scored in detail, and in the case of a percentile score, a data sample may be scored at 85 points. It is readily understood that the manner in which scoring is performed for percentiles requires the scorer to establish more detailed scoring criteria.
In addition, the server can randomly extract a preset number of data samples from each data source, and the data format of the data samples is not particularly limited in the present disclosure. In addition, the preset number may be set by a developer to synthesize the total data amount, the workload of the scorer, and the actual processing capacity of the server, which is not particularly limited in the exemplary embodiment.
Still taking medical record data in the medical industry as an example, first, the server may extract a preset number of medical record data from each data source, where the medical record data may be data from each department system of a hospital, or data from different hospitals, or may summarize medical record data of all hospitals in a certain area to construct a unified medical record data pool, and the server may extract a preset number of medical record data from the medical record data pool. In addition, the preset number may be set to, for example, 1000 parts; next, the server can provide the 1000 medical records to one or more medical experts for scoring to obtain a scoring result of the 1000 medical records. It should be noted that when the medical expert performs the scoring, it may perform the scoring for a certain evaluation dimension (or evaluation angle, evaluation direction), for example, when researching the quality of the hypertension medical record data, the medical expert in the hypertension field may perform the scoring only for the index information of the hypertension; in studying medical history data of hepatitis, medical experts in the field of liver disease can score only for the index of liver disease, and so on. In addition, for medical experts, taking 1-5-grade grading score as an example, the medical record data can be graded and identified and stored in the system, for example, if the medical experts think that one medical record data is complete and objective, the medical record can be identified as 1 grade and used as a grading result; next, the server can obtain the scoring result and store the scoring result in the system, so that the system has the corresponding relationship between the medical record data and the 1 st file.
In step S14, it is determined whether or not the key information exists in each of the data samples, and a key information vector for each of the data samples is generated based on the determination result.
In an exemplary embodiment of the present disclosure, it may be determined whether the data sample acquired in step S12 contains the first key information and/or the second key information determined in step S10. Specifically, first, the information contained in the data sample may be compared with the first key information and/or the second key information one by one; next, taking binary representation of the comparison result as an example, if one piece of information of the data sample is compared to be the same as the first key information and/or the second key information, the comparison result may be set to 1, and if one piece of key information of the first key information and/or the second key information is found not to be present in the data sample, the comparison result may be set to 0. Therefore, for a data sample, after information is compared one by one, a key information vector consisting of the comparison results (1 or 0) can be generated; the server may then store the key information vectors to the system to form a one-to-one correspondence of data samples to the key information vectors.
In addition, the developer may set a similarity threshold between the key information in step S10 and the information contained in the data sample. If the two satisfy the similarity threshold, the two are considered to be the same, and are compared one by one to obtain the key information vector. The specific setting method of the similarity threshold is not particularly limited in this disclosure. In practice, the attending physician may cause some miswritten words or acronyms due to habitual problems or miswriting problems, which can be effectively solved by the configuration of the similarity threshold.
In step S16, the scoring results and the key information vectors are trained by a regression algorithm to generate a training model.
In an exemplary embodiment of the present disclosure, the fitting may be performed by a regression algorithm with respect to the scoring result obtained in step S12 and the key information vector generated in step S14. The regression algorithm may be one of the existing regression algorithms, and the compiling language of the regression algorithm is not limited by the present disclosure. That is, the relationship between the scoring results and the key information vectors may be determined by means of a regression algorithm.
According to some embodiments of the present disclosure, first, a regression algorithm may be used to perform fitting analysis on the score result and the key information vector, so as to obtain a score after machine learning. Wherein the score may be expressed in decimal form, e.g., the score after machine learning may be 0.98; after the scores of the data samples are obtained, the scores may be sorted from large to small, and in addition, the scores may also be sorted from small to large, which is not particularly limited in the exemplary embodiment; next, a score threshold for ranking the scores may be determined based on the scoring results of the ranking scores in step S12.
Still taking medical record data in the medical industry as an example, after the scores of 1000 medical record data are determined, the 1000 scores can be sorted from large to small. Next, taking the score of 1-5 as an example, based on the specific scores of 1000 scores, for example, data with a score of more than 0.9 may be determined as 1 st, data with a score of 0.9 or less and more than 0.7 may be determined as 2 nd, data with a score of 0.7 or less and more than 0.6 may be determined as 3 rd, data with a score of 0.6 or less and more than 0.4 may be determined as 4 th, and data with a score of 0.4 or less may be determined as 5 th. Thus, a score threshold between 1 st gear and 2 nd gear may be determined as 0.9, a score threshold between 2 nd gear and 3 rd gear as 0.7, a score threshold between 3 rd gear and 4 th gear as 0.6, and a score threshold between 4 th gear and 5 th gear as 0.4. In summary, a relationship between the machine-learned score of the data sample and the key information contained in the data sample can be obtained, and in some embodiments, the relationship can be considered as a training model generated after machine training.
It is easily understood that the above describes the process of training the scoring result and the key information vector in a schematic manner, however, the developer may also determine the score threshold in other manners, for example, after 1000 scores are sorted from large to small, the scores are divided into five equal parts, and the scores are respectively determined as 1-5 grades from large to small, in this case, the number of data samples in 1 grade may be, for example, 216, and the scores after machine learning are all greater than or equal to 0.93; the number of data samples in gear 2 may be, for example, 511, with a score after machine learning of less than 0.93 and equal to or greater than 0.78; the number of data samples in gear 3 may be, for example, 127, with a score after machine learning of less than 0.78 and equal to or greater than 0.63; the number of data samples in the 4 th gear may be, for example, 101, and the score after machine learning is less than 0.63 and equal to or greater than 0.48; the number of data samples in gear 5 may be, for example, 45, with a machine-learned score of less than 0.48. At this time, the score thresholds for 1-5 gear may be 0.93, 0.78, 0.63, 0.48, respectively. The rules for specifically generating the training model are not specially limited in the disclosure, and can be defined by developers.
In step S18, data to be evaluated is acquired and the training model is run against the data to be evaluated to obtain a scoring result of the data to be evaluated as an evaluation result.
Steps S10 to S16 are directed to data samples, which may be obtained by random extraction. After the training model is generated, when data evaluation is required, the server may obtain data to be evaluated and operate the training model to obtain a scoring result of the data to be evaluated, and the scoring result is used as an evaluation result of the data to be evaluated. That is, first, it may be determined whether the data to be evaluated has the key information determined in step S10, and a key information vector of the data to be evaluated may be generated according to the determination result. Next, determining a score corresponding to the key information vector of the data to be evaluated according to the training model generated in step S16, determining a scoring result of the data to be evaluated according to a specific numerical value of the score, and taking the scoring result as an evaluation result.
Still taking medical record data in the medical industry as an example, if the score obtained by running the training model on the data to be evaluated is 0.88, for example, in an example where data below 0.9 and exceeding 0.7 is determined as the 2 nd gear, the score result of the data to be evaluated may be determined as the 2 nd gear, and the 2 nd gear is taken as the evaluation result of the data to be evaluated.
The entire process of one embodiment of the data evaluation method of the present disclosure will be explained with reference to fig. 2. In step S201, the server determines first key information according to "basic medical record writing specification", where the first key information includes information based on data integrity; in step S203, the server may obtain second key information determined by the medical expert according to a clinical route, a medical guideline, and the like, where the second key information includes index information corresponding to hypertension; in step S205, the server can obtain 1000 pieces of data of the hypertension medical record, and perform grading and scoring in 1-5 grades by medical experts in the hypertension field; in step S207, the server may determine the first key information and the second key information included in 1000 pieces of medical record data, and generate a key information vector; in step S209, the server may train the scoring result of step S205 and the key information vector generated in step S207 through a regression algorithm to generate a training model; in step S211, the server may run a training model for the medical record data to be evaluated to obtain a scoring result of the medical record data to be evaluated as an evaluation result.
Taking medical record data in the medical industry as an example, the data evaluation method disclosed by the disclosure has the advantages that on one hand, after medical record data samples are graded by medical experts, the medical record data are analyzed and graded in a software mode without a process of human participation, so that the problem of continuous input of manpower is solved; on the other hand, in a scheme that the key information comprises information based on data integrity, the integrity degree of medical record information can be evaluated; on the other hand, medical experts in various disease fields determine second key information, and medical record data are evaluated on the basis of the second key information, so that disease-oriented professional evaluation of the medical record data is ensured; on the other hand, by evaluating the medical record data, researchers can extract the medical records with high evaluation for research, so that the purpose of better utilizing the historical treatment experience is achieved, and meanwhile, the development of informatization of the medical industry is facilitated. In addition, the scheme can replace the mode of subjectively evaluating the medical record data by related personnel (for example, staff in a medical record room), so that the problem of unstable evaluation of the medical record data can be avoided, and the risk of wrong evaluation of the medical record data is reduced.
It should be noted that although the various steps of the methods of the present disclosure are depicted in the drawings in a particular order, this does not require or imply that these steps must be performed in this particular order, or that all of the depicted steps must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions, etc.
Further, the present exemplary embodiment also provides a data evaluation apparatus.
Fig. 3 schematically shows a block diagram of a data evaluation device of an exemplary embodiment of the present disclosure. Referring to fig. 3, the data evaluation apparatus 3 according to an exemplary embodiment of the present disclosure may include a key information determination module 31, a data sample scoring module 33, a key information vector generation module 35, a training model generation module 37, and a data evaluation module 39, wherein:
a key information determining module 31, which can be used to determine key information in the data;
the data sample scoring module 33 may be configured to obtain a preset number of data samples and provide the preset number of data samples to a scoring person, so that the scoring person scores each data sample to obtain a scoring result of each data sample;
a key information vector generating module 35, configured to determine whether each data sample has the key information, and generate a key information vector for each data sample according to a determination result;
a training model generation module 37, configured to train the scoring result and the key information vector through a regression algorithm to generate a training model; and
the data evaluation module 39 may be configured to acquire data to be evaluated and run the training model for the data to be evaluated, so as to obtain a scoring result of the data to be evaluated as an evaluation result.
In the data evaluation device, the scoring result of the data sample is obtained, the key information vector of the data sample is generated, the scoring result and the key information vector of the data sample are trained through a regression algorithm, and the data to be evaluated is evaluated according to the generated training model; on the other hand, the process of human participation is reduced, so that the problem of unstable data evaluation caused by subjective factors of a marker can be reduced; on the other hand, due to the fact that human participation is reduced, the data evaluation efficiency can be improved.
According to an exemplary embodiment of the present disclosure, referring to fig. 4, the training model generation module 37 may include a score acquisition unit 401 and a score threshold determination unit 403, wherein:
a score obtaining unit 401, configured to perform fitting analysis on the scoring result and the key information vector of each data sample through a regression algorithm to obtain a score of each data sample;
the score threshold determination unit 403 may be configured to determine a score threshold for grading scores according to the scoring result of each data sample and by combining the scores of each data sample.
According to an exemplary embodiment of the present disclosure, the key information includes first key information and/or second key information;
the first key information comprises information based on data integrity, and the second key information comprises index information corresponding to the evaluation dimension.
According to an exemplary embodiment of the present disclosure, referring to fig. 5, the key information determining module 31 may include a first key information determining module 501 and a second key information determining module 503, wherein:
a first key information determining unit 501, configured to determine the first key information according to a preset data specification; and/or
The second key information determination unit 503 may be configured to determine index information specified by the scorer as the second key information.
Since each functional module of the program operation performance analysis apparatus according to the embodiment of the present invention is the same as that in the embodiment of the present invention, it is not described herein again.
In an exemplary embodiment of the present disclosure, there is also provided a computer-readable storage medium having stored thereon a program product capable of implementing the above-described method of the present specification. In some possible embodiments, aspects of the invention may also be implemented in the form of a program product comprising program code means for causing a terminal device to carry out the steps according to various exemplary embodiments of the invention described in the above section "exemplary methods" of the present description, when said program product is run on the terminal device.
Referring to fig. 6, a program product 600 for implementing the above method according to an embodiment of the present invention is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present invention is not limited in this regard and, in the present document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
In an exemplary embodiment of the present disclosure, an electronic device capable of implementing the above method is also provided.
As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or program product. Thus, various aspects of the invention may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.
An electronic device 700 according to this embodiment of the invention is described below with reference to fig. 7. The electronic device 700 shown in fig. 7 is only an example and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 7, electronic device 700 is embodied in the form of a general purpose computing device. The components of the electronic device 700 may include, but are not limited to: the at least one processing unit 710, the at least one memory unit 720, a bus 730 connecting different system components (including the memory unit 720 and the processing unit 710), and a display unit 740.
Wherein the storage unit stores program code that is executable by the processing unit 710 such that the processing unit 710 performs the steps according to various exemplary embodiments of the present invention as described in the above section "exemplary method" of the present specification. For example, the processing unit 710 may execute step S10 as shown in fig. 1: determining key information in the data; step S12: acquiring a preset number of data samples and providing the data samples for a grader to grade the data samples so as to obtain a grading result of each data sample; step S14: judging whether the key information exists in each data sample or not and generating a key information vector aiming at each data sample according to a judgment result; step S16: training the scoring results and the key information vectors through a regression algorithm to generate a training model; and step S18: and acquiring data to be evaluated, and operating the training model aiming at the data to be evaluated to obtain a grading result of the data to be evaluated as an evaluation result.
The storage unit 720 may include readable media in the form of volatile memory units, such as a random access memory unit (RAM)7201 and/or a cache memory unit 7202, and may further include a read only memory unit (ROM) 7203.
The storage unit 720 may also include a program/utility 7204 having a set (at least one) of program modules 7205, such program modules 7205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
Bus 730 may be any representation of one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.
The electronic device 700 may also communicate with one or more external devices 800 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 700, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 700 to communicate with one or more other computing devices. Such communication may occur via an input/output (I/O) interface 750. Also, the electronic device 700 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the internet) via the network adapter 760. As shown, the network adapter 760 communicates with the other modules of the electronic device 700 via the bus 730. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 700, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.
Furthermore, the above-described figures are merely schematic illustrations of processes involved in methods according to exemplary embodiments of the invention, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.
It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is to be limited only by the terms of the appended claims.

Claims (4)

1. A medical record data evaluation method is characterized by comprising the following steps:
determining key information in medical record data;
acquiring a preset number of data samples and providing the data samples for a grader to grade the data samples so as to obtain a grading result of each data sample;
judging whether the key information exists in each data sample or not and generating a key information vector aiming at each data sample according to a judgment result;
training the scoring results and the key information vectors through a regression algorithm to generate a training model; and
acquiring medical record data to be evaluated and operating the training model aiming at the medical record data to be evaluated to obtain a scoring result of the medical record data to be evaluated as an evaluation result;
the key information comprises first key information and/or second key information;
the first key information comprises information based on data integrity, and the second key information comprises index information corresponding to the evaluation dimension;
the determining key information in the medical record data comprises:
determining first key information according to a preset data specification, and/or determining index information designated by a grader as second key information;
the judging whether the key information exists in each data sample and generating a key information vector aiming at each data sample according to a judgment result includes:
comparing the data samples with the key information to generate a binary comparison result, and generating a key information vector aiming at each data sample according to the binary comparison result; or calculating the similarity between the data samples and the key information, and generating a key information vector aiming at each data sample according to the similarity;
training the scoring results and the key information vectors through a regression algorithm includes:
performing fitting analysis on the scoring result and the key information vector of each data sample through a regression algorithm to obtain a score of each data sample;
and determining a score threshold value for grading scores according to the scoring result of each data sample and the score of each data sample.
2. A medical record data evaluation device, comprising:
the key information determining module is used for determining key information in the medical record data;
the data sample scoring module is used for acquiring a preset number of data samples and providing the data samples to a scoring person so that the scoring person can score the data samples to obtain a scoring result of the data samples;
the key information vector generating module is used for judging whether the key information exists in each data sample and generating a key information vector aiming at each data sample according to a judgment result;
the training model generation module is used for training the scoring result and the key information vector through a regression algorithm to generate a training model; and
the data evaluation module is used for acquiring medical record data to be evaluated and operating the training model aiming at the medical record data to be evaluated so as to obtain a scoring result of the medical record data to be evaluated as an evaluation result;
the key information comprises first key information and/or second key information;
the first key information comprises information based on data integrity, and the second key information comprises index information corresponding to the evaluation dimension;
the determining key information in the medical record data comprises:
determining first key information according to a preset data specification, and/or determining index information designated by a grader as second key information;
the judging whether the key information exists in each data sample and generating a key information vector aiming at each data sample according to a judgment result includes:
comparing the data samples with the key information to generate a binary comparison result, and generating a key information vector aiming at each data sample according to the binary comparison result; or calculating the similarity between the data samples and the key information, and generating a key information vector aiming at each data sample according to the similarity;
training the scoring results and the key information vectors through a regression algorithm includes:
performing fitting analysis on the scoring result and the key information vector of each data sample through a regression algorithm to obtain a score of each data sample;
and determining a score threshold value for grading scores according to the scoring result of each data sample and the score of each data sample.
3. A storage medium having stored thereon a computer program, characterized in that the computer program, when being executed by a processor, implements the data evaluation method of claim 1.
4. An electronic device, comprising:
a processor; and
a memory for storing executable instructions of the processor;
wherein the processor is configured to perform the data evaluation method of claim 1 via execution of the executable instructions.
CN201711006245.5A 2017-10-25 2017-10-25 Data evaluation method and device, storage medium and electronic device Active CN107845408B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711006245.5A CN107845408B (en) 2017-10-25 2017-10-25 Data evaluation method and device, storage medium and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711006245.5A CN107845408B (en) 2017-10-25 2017-10-25 Data evaluation method and device, storage medium and electronic device

Publications (2)

Publication Number Publication Date
CN107845408A CN107845408A (en) 2018-03-27
CN107845408B true CN107845408B (en) 2020-10-27

Family

ID=61663103

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711006245.5A Active CN107845408B (en) 2017-10-25 2017-10-25 Data evaluation method and device, storage medium and electronic device

Country Status (1)

Country Link
CN (1) CN107845408B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109190879B (en) * 2018-07-18 2020-08-11 阿里巴巴集团控股有限公司 Method and device for training adaptation level evaluation model and evaluating adaptation level
CN109614987A (en) * 2018-11-08 2019-04-12 北京字节跳动网络技术有限公司 More disaggregated model optimization methods, device, storage medium and electronic equipment
CN109858812B (en) * 2019-01-31 2021-08-24 泰康保险集团股份有限公司 Human resource management method, device, medium and electronic equipment based on block chain
CN111949847A (en) * 2020-08-14 2020-11-17 中国工商银行股份有限公司 Information evaluation method, information evaluation device, computer system, and medium
CN115269932B (en) * 2022-09-29 2022-12-20 江西联创精密机电有限公司 Training scoring method and device for simulation training equipment, storage medium and equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105809360A (en) * 2016-03-16 2016-07-27 北京银讯财富信息技术有限公司 P2P industry risk assessment method
CN105824806A (en) * 2016-06-13 2016-08-03 腾讯科技(深圳)有限公司 Quality evaluation method and device for public accounts
CN106294363A (en) * 2015-05-15 2017-01-04 厦门美柚信息科技有限公司 A kind of forum postings evaluation methodology, Apparatus and system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9430532B2 (en) * 2013-07-30 2016-08-30 NETFLIX Inc. Media content rankings for discovery of novel content

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106294363A (en) * 2015-05-15 2017-01-04 厦门美柚信息科技有限公司 A kind of forum postings evaluation methodology, Apparatus and system
CN105809360A (en) * 2016-03-16 2016-07-27 北京银讯财富信息技术有限公司 P2P industry risk assessment method
CN105824806A (en) * 2016-06-13 2016-08-03 腾讯科技(深圳)有限公司 Quality evaluation method and device for public accounts

Also Published As

Publication number Publication date
CN107845408A (en) 2018-03-27

Similar Documents

Publication Publication Date Title
CN107845408B (en) Data evaluation method and device, storage medium and electronic device
Zech et al. Natural language–based machine learning models for the annotation of clinical radiology reports
McWilliams et al. Towards a decision support tool for intensive care discharge: machine learning algorithm development using electronic healthcare data from MIMIC-III and Bristol, UK
CN109522551B (en) Entity linking method and device, storage medium and electronic equipment
CN109584975B (en) Medical data standardization processing method and device
CN109670054B (en) Knowledge graph construction method and device, storage medium and electronic equipment
CN107799160B (en) Medication aid decision-making method and device, storage medium and electronic equipment
US20220101967A1 (en) Methods for automatic cohort selection in epidemiologic studies and clinical trials
US9257052B2 (en) Evaluating candidate answers to questions in a target knowledge domain
Bekhuis et al. Feature engineering and a proposed decision-support system for systematic reviewers of medical evidence
CN112541056B (en) Medical term standardization method, device, electronic equipment and storage medium
CN110651332A (en) Generating a list of medical record issues
CN110609910B (en) Medical knowledge graph construction method and device, storage medium and electronic equipment
US20150227714A1 (en) Medical information analysis apparatus and medical information analysis method
Suominen et al. Capturing patient information at nursing shift changes: methodological evaluation of speech recognition and information extraction
CN109657056B (en) Target sample acquisition method and device, storage medium and electronic equipment
CN111145909A (en) Diagnosis and treatment data processing method and device, storage medium and electronic equipment
EP3144833A1 (en) Computer apparatus and method for enhancing clinical pathways
CN110471941B (en) Method and device for automatically positioning judgment basis and electronic equipment
Tsai et al. Leveraging hierarchical category knowledge for data-imbalanced multi-label diagnostic text understanding
Zhang et al. Comparison of chest radiograph captions based on natural language processing vs completed by radiologists
Xiong et al. Daehr: A discriminant analysis framework for electronic health record data and an application to early detection of mental health disorders
Ferrão et al. Leveraging electronic health record data to inform hospital resource management: A systematic data mining approach
US20190295695A1 (en) SOAP Based Analysis of Patient EMR to Identify Treatment Plan Features in a Patient EMR
Liao et al. Evaluation of the mining techniques in constructing a traditional Chinese-language nursing recording system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant