CN113535933A - Case retrieval method and device, electronic equipment and storage device - Google Patents

Case retrieval method and device, electronic equipment and storage device Download PDF

Info

Publication number
CN113535933A
CN113535933A CN202110610809.6A CN202110610809A CN113535933A CN 113535933 A CN113535933 A CN 113535933A CN 202110610809 A CN202110610809 A CN 202110610809A CN 113535933 A CN113535933 A CN 113535933A
Authority
CN
China
Prior art keywords
case
text
candidate
retrieved
sub
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110610809.6A
Other languages
Chinese (zh)
Other versions
CN113535933B (en
Inventor
胡弘康
盛志超
李�浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
iFlytek Co Ltd
Original Assignee
iFlytek Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by iFlytek Co Ltd filed Critical iFlytek Co Ltd
Priority to CN202110610809.6A priority Critical patent/CN113535933B/en
Publication of CN113535933A publication Critical patent/CN113535933A/en
Application granted granted Critical
Publication of CN113535933B publication Critical patent/CN113535933B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/18Legal services
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Business, Economics & Management (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Tourism & Hospitality (AREA)
  • Technology Law (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a case retrieval method and device, electronic equipment and a storage device, wherein the case retrieval method comprises the following steps: acquiring a case to be retrieved and a case base; the case base comprises a plurality of candidate cases and first characteristic information of the candidate cases, wherein the first characteristic information relates to at least one of relevant laws of the candidate cases and case contents of the candidate cases; extracting second characteristic information of the case to be retrieved; the second characteristic information relates to at least one of relevant legal rules of the case to be retrieved and case content of the case to be retrieved; and selecting at least one candidate case as a target case matched with the case to be retrieved based on the first characteristic information and the second characteristic information. According to the scheme, the accuracy of case retrieval can be improved.

Description

Case retrieval method and device, electronic equipment and storage device
Technical Field
The present application relates to the field of natural language processing technologies, and in particular, to a case retrieval method and apparatus, an electronic device, and a storage apparatus.
Background
With the development of electronic information technology, it has become more and more common to store cases in electronic form. By storing cases in an electronic form, relevant personnel can conveniently look up the cases, thereby providing great convenience for office work. However, an effective case retrieval means capable of accurately retrieving the case matching the case to be retrieved from the case library is still lacking at present. In view of the above, how to improve the accuracy of case retrieval is an urgent problem to be solved.
Disclosure of Invention
The application mainly solves the technical problem of providing a case retrieval method and device, an electronic device and a storage device, and can improve the accuracy of case retrieval.
In order to solve the above technical problem, a first aspect of the present application provides a case retrieval method, including: acquiring a case to be retrieved and a case base; the case base comprises a plurality of candidate cases and first characteristic information of the candidate cases, wherein the first characteristic information relates to at least one of relevant laws of the candidate cases and case contents of the candidate cases; extracting second characteristic information of the case to be retrieved; the second characteristic information relates to at least one of relevant legal rules of the case to be retrieved and case content of the case to be retrieved; and selecting at least one candidate case as a target case matched with the case to be retrieved based on the first characteristic information and the second characteristic information.
In order to solve the technical problem, a second aspect of the present application provides a case retrieval device, which includes a case acquisition module, a feature extraction module and a case selection module, wherein the case acquisition module is used for acquiring a case to be retrieved and a case base; the case base comprises a plurality of candidate cases and first characteristic information of the candidate cases, wherein the first characteristic information relates to at least one of relevant laws of the candidate cases and case contents of the candidate cases; the characteristic extraction module is used for extracting second characteristic information of the case to be retrieved; the second characteristic information relates to at least one of relevant legal rules of the case to be retrieved and case content of the case to be retrieved; the case selection module is used for selecting at least one candidate case as a target case matched with the case to be retrieved based on the first characteristic information and the second characteristic information.
In order to solve the above technical problem, a third aspect of the present application provides an electronic device, which includes a memory and a processor coupled to each other, where the memory stores program instructions, and the processor is configured to execute the program instructions to implement the case search method in the first aspect.
In order to solve the above technical problem, a fourth aspect of the present application provides a storage device, which stores program instructions capable of being executed by a processor, where the program instructions are used to implement the case search method in the first aspect.
According to the scheme, the case to be retrieved and the case base are obtained, the case base comprises a plurality of candidate cases and first characteristic information of the candidate cases, the first characteristic information relates to at least one of related legal rules of the candidate cases and case contents of the candidate cases, second characteristic information of the case to be retrieved is extracted, the second characteristic information relates to at least one of related legal rules of the case to be retrieved and case contents of the case to be retrieved, and on the basis, at least one candidate case is selected as a target case matched with the case to be retrieved based on the first characteristic information and the second characteristic information.
Drawings
FIG. 1 is a schematic flow chart diagram illustrating an embodiment of a case retrieval method of the present application;
FIG. 2 is a state diagram of an embodiment of the case retrieval method of the present application;
FIG. 3 is a flow diagram of one embodiment of obtaining a set of laws;
FIG. 4 is a block diagram of an embodiment of a French analyzing network;
FIG. 5 is a state diagram of one embodiment of obtaining a set of rules associated with a case;
FIG. 6 is a flow diagram of one embodiment of obtaining a semantic representation;
FIG. 7 is a block diagram of an embodiment of a text classification network;
FIG. 8 is a flowchart illustrating an embodiment of step S13 in FIG. 1;
FIG. 9 is a schematic flow chart illustrating another embodiment of step S13 in FIG. 1;
FIG. 10 is a state diagram illustrating one embodiment of obtaining a second match score;
FIG. 11 is a block diagram of an embodiment of an electronic device of the present application;
FIG. 12 is a schematic diagram of a framework of an embodiment of the case retrieval device of the present application;
FIG. 13 is a block diagram of an embodiment of a memory device according to the present application.
Detailed Description
The following describes in detail the embodiments of the present application with reference to the drawings attached hereto.
In the following description, for purposes of explanation and not limitation, specific details are set forth such as particular system structures, interfaces, techniques, etc. in order to provide a thorough understanding of the present application.
The terms "system" and "network" are often used interchangeably herein. The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship. Further, the term "plurality" herein means two or more than two.
Referring to fig. 1, fig. 1 is a schematic flow chart illustrating an embodiment of a case retrieval method according to the present application.
Specifically, the method may include the steps of:
step S11: and obtaining the case to be retrieved and a case base.
In one implementation scenario, the case to be retrieved may include a key text related to the case content, and the key text may specifically include but is not limited to: accident details, accident consequences, division of responsibility, etc. For example, taking a civil case as an example, the case to be retrieved may be "XXXX month XX day in XXXX year, zhang san tailgates the front car driven by minired, causing the minired to be injured lightly, zhang san has main responsibility"; or, taking a criminal case as an example, the case to be retrieved may be "XXXX XX days in XXXX year, lie four breaks a communication cable to cause communication interruption, and lie four has criminal responsibility", and the rest may be analogized, and the examples are not repeated.
In one implementation scenario, the case to be retrieved may be derived from user input. For example, the user may input through a keyboard, or may input through a microphone, which is not limited herein.
It should be noted that, in the embodiment of the present disclosure, the case base may include a plurality of candidate cases and first feature information of the candidate cases, where the first feature information relates to at least one of relevant laws of the candidate cases and case contents of the candidate cases.
In an implementation scenario, in order to reduce redundant information in the case base and improve the efficiency of case retrieval, the candidate case may retain text content related to "original appeal" and text content related to "home finding", and other text content may be eliminated. Therefore, redundant information in the candidate cases can be effectively reduced, and the case retrieval efficiency is improved.
In one implementation scenario, in a case where the first feature information relates to relevant legal rules of the candidate case, the first feature information may specifically include a first set of legal rules relevant to the candidate case. Taking the candidate case "Zhao Wu driving without any evidence, and going to some place before, escaping after the accident" as an example, the first law set may include: nineteenth law on road traffic safety (i.e. driving motor vehicles, which should be legally given a motor vehicle driving license) and fifty-third law on infringement liability (i.e. reimbursement and rescue for a motor vehicle when the hit-and-run accident occurs). Other cases may be analogized, and no one example is given here.
In a specific implementation scenario, the candidate cases may be subjected to sentence splitting processing to obtain a plurality of first sub-texts, the plurality of first sub-texts are respectively analyzed by using a law enforcement analysis network to obtain a law enforcement classification representation of the first sub-texts, the law enforcement classification representation includes a plurality of elements arranged in sequence, the first sub-texts are related to laws corresponding to sequence positions of the elements when the elements are first numerical values, and the first sub-texts are unrelated to laws corresponding to sequence positions of the elements when the elements are second numerical values. The specific process may refer to the related description in the following disclosed embodiments, and is not repeated herein.
In another specific implementation scenario, sentence division may be performed on the candidate cases with periods, commas, and the like as boundaries. Still taking the aforementioned candidate case "Zhao Wu driving a vehicle without evidence, and going on some way, escaping after hit" as an example, commas can be used as a boundary to obtain the following first sub-text: "Zhao Wu driving motor vehicle without evidence", "and going to some place in the front" and "escape after accident". Other cases may be analogized, and no one example is given here.
In yet another specific implementation scenario, the law enforcement analysis network may specifically include a semantic extraction sub-network, a full connectivity layer, a normalization layer, and the like, which are not limited herein. Among other things, the semantic extraction sub-network may include, but is not limited to: BERT (Bidirectional Encoder retrieval from Transformers), RoBERTA, XLNET, etc., without limitation thereto.
In another specific implementation scenario, the dimension of the french classification representation is the same as the total number of french articles, that is, the total number of the elements included in the french classification representation is the same as the total number of french articles, and the elements of each ordinal in the french classification representation correspond to different french articles respectively. Taking the total number of the french rules as 5 as an example, the element of the first ordinal position in the french classification representation may correspond to french No.1, the element of the second ordinal position in the french classification representation may correspond to french No.2, and so on, the element of the ith ordinal position in the french classification representation may correspond to french No. i, which is not illustrated one by one here. The french rules specifically referred to as french rules No.1, french rule No.2, and french rule No. i may be set according to actual application needs. For example, the law No.1 may specifically refer to the nineteenth item of the road traffic safety law, or the law No.1 may specifically correspond to the fifty-third item of the infringement liability law as needed, and is not limited herein.
In yet another implementation scenario, where the first feature information relates to case content of the candidate case, the first feature information may include first semantic representations of the several text types, and the first semantic representations of the several text types may be derived based on subfiles belonging to the several text types in the candidate case, respectively. Specifically, several text types may include, but are not limited to: accident details, accident consequences, division of responsibility, etc., and are not limited herein. Taking the candidate case "XXXX in XX month XX day, Zhang san will crash the new silk to cause the new silk to suffer from serious injury, Zhang san bears the main responsibility" as an example, here, the subfile "Zhang san will crash the new silk to" belongs to the text type "accident detail", the subfile "causes the new silk to suffer from serious injury" belongs to the text type "accident consequence", and the subfile "Zhang san bears the main responsibility" belongs to the text type "responsibility division". Other cases may be analogized, and no one example is given here.
In a specific implementation scenario, the candidate case may be subjected to sentence splitting to obtain a plurality of second sub-texts, and a target sub-text belonging to any one of a plurality of text types is screened from the plurality of second sub-texts, on this basis, a text semantic representation of the target sub-text may be extracted, and a final semantic representation of a corresponding text type is obtained based on a text semantic representation of target sub-texts belonging to the same text type, where the final semantic representation is the first semantic representation of the text type. The specific process may refer to the related description in the following disclosed embodiments, and is not repeated herein.
In another specific implementation scenario, sentence division may be performed on the candidate cases with periods, commas, and the like as boundaries. Reference may be made to the foregoing description for details, which are not repeated herein.
In another specific implementation scenario, the text classification network may be used to classify the plurality of second sub-texts, so as to obtain a text type of the second sub-text. It should be noted that there may be second sub-texts that do not belong to any text type in the second sub-texts, and the text classification network may specifically output probability values that the second sub-texts respectively belong to the text types and do not belong to any text type, so that the text type to which the second sub-text belongs may be determined based on the probability values, or the second sub-text does not belong to any text type based on the probability values. For details, reference may be made to the related descriptions in the following disclosed embodiments, which are not repeated herein.
In yet another implementation scenario, where the first feature information relates to both the relevant rules for the candidate case and the case content for the candidate case, the first feature information may include a first set of rules related to the candidate case and a first semantic representation of several text types. For the first set of bars and the first semantic representation, reference may be made to the foregoing related description, and details are not repeated here.
Step S12: and extracting second characteristic information of the case to be retrieved.
In the embodiment of the disclosure, the second characteristic information relates to at least one of a related law of the case to be retrieved and case content of the case to be retrieved. It should be noted that, because the user does not need to manually treat the summary keywords of the search cases to search the cases according to the keywords, convenience and intellectualization of case search can be improved.
In one implementation scenario, similar to the second feature information, in the case that the second feature information relates to a relevant law of the case to be retrieved, the second feature information may specifically include a second set of laws relevant to the case to be retrieved.
In a specific implementation scenario, the case to be retrieved may be subjected to clause processing to obtain a plurality of first sub-texts, and the plurality of first sub-texts are respectively analyzed by using a law enforcement analysis network to obtain a law enforcement classification representation of the first sub-texts, and the law enforcement classification representation includes a plurality of elements arranged in sequence, where an element is a first numerical value, the first sub-text is related to a law enforcement corresponding to an order position of the element, and where an element is a second numerical value, the first sub-text is unrelated to a law enforcement corresponding to an order position of the element. The specific process may refer to the related description in the following disclosed embodiments, and is not repeated herein.
In another implementation scenario, similar to the second feature information, in case the second feature information relates to case content of the case to be retrieved, the second feature information may specifically comprise a second semantic representation of several text types.
In a specific implementation scenario, the case to be retrieved may be subjected to sentence splitting processing to obtain a plurality of second sub-texts, and the target sub-texts belonging to any one of the plurality of text types are screened from the plurality of second sub-texts, on this basis, the text semantic representation of the target sub-texts may be extracted, and a final semantic representation of the corresponding text type is obtained based on the text semantic representation of the target sub-texts belonging to the same text type, and at this time, the final semantic representation is the second semantic representation of the text type. The specific process may refer to the related description in the following disclosed embodiments, and is not repeated herein.
In a further implementation scenario, similar to the second feature information, in the case that the second feature information relates to both the related legal terms of the case to be retrieved and the case content of the case to be retrieved, the second feature information may specifically include a second set of legal terms related to the case to be retrieved and a second semantic representation of several text types. Reference may be made to the foregoing description for details, which are not repeated herein.
In order to improve the efficiency of case search and reduce the probability of extracting redundant feature information, the contents of the first feature information and the second feature information may be kept the same. For example, in the case that the first feature information relates to the relevant law of the candidate case, the second feature information may also relate to the relevant law of the case to be retrieved; or, in the case that the first characteristic information relates to case content of the candidate case, the second characteristic information may also relate to case content of the case to be retrieved; or, in the case that the first feature information relates to both the related law of the candidate case and the case content of the candidate case, the second feature information may also relate to both the same law of the case to be retrieved and the case content of the case to be retrieved. In addition, in order to further improve the accuracy of case retrieval, the first feature information may relate to both the related law of the candidate case and the case content of the candidate case, and at the same time, the second feature information may also relate to both the same law of the case to be retrieved and the case content of the case to be retrieved, that is, the first feature information may specifically include a first set of laws related to the candidate case and a first semantic representation of several text types, and the second feature information may specifically include a second set of laws related to the case to be retrieved and a second semantic representation of several text types.
Step S13: and selecting at least one candidate case as a target case matched with the case to be retrieved based on the first characteristic information and the second characteristic information.
In one implementation scenario, as described above, in a case where the first feature information relates to a relevant law of the candidate case and the second feature information relates to a relevant law of the case to be retrieved, the first feature information may specifically include a first set of laws related to the candidate case, and the second feature information may specifically include a second set of laws related to the case to be retrieved. In the above manner, under the condition that the first characteristic information relates to the relevant law of the candidate case and the second characteristic information relates to the relevant law of the case to be retrieved, the first characteristic information is set to include the first law set relevant to the candidate case, and the second characteristic information is set to include the second law set relevant to the case to be retrieved, so that the target case can be selected from a plurality of candidate cases based on the first law set and the second law set, the case can be modeled from the law angle, and the accuracy of case retrieval is improved.
In a specific implementation scenario, for each candidate case, a first matching score between the candidate case and the case to be retrieved may be obtained based on a coverage condition of the second law set on the first law set of the candidate case, and at least one candidate case is selected as a target case based on the first matching score. The specific process may refer to the related description in the following disclosed embodiments, and is not repeated herein. In the above manner, for each candidate case, the first matching score between the candidate case and the case to be retrieved is obtained through the coverage condition of the second law set on the first law set of the candidate case, and the target case is selected and obtained based on the first matching score, so that the matching degree between the candidate case and the case to be retrieved can be accurately described in a law angle, and the accuracy of case retrieval can be improved.
In another implementation scenario, as described above, in a case where the first feature information relates to case content of the candidate case, and the second feature information relates to case content of the case to be retrieved, the first feature information includes first semantic representations of the text types, and the second feature information includes second semantic representations of the text types, and the first semantic representations of the text types are obtained based on the sub-texts belonging to the text types in the candidate case respectively, and the second semantic representations of the text types are obtained based on the sub-texts belonging to the text types in the case to be retrieved respectively. In the above manner, under the condition that the first characteristic information relates to case content of the candidate case and the second characteristic information relates to case content of the case to be retrieved, the first characteristic information is set to be the first semantic representation comprising a plurality of text types and the second characteristic information is set to be the second semantic representation comprising a plurality of text types, so that the target case can be selected from the candidate cases based on the first semantic representation and the second semantic representation, the case can be modeled from the content perspective, and compared with the case directly retrieved through the character string, the accuracy of case retrieval can be improved.
In a specific implementation scenario, for each candidate case, a second matching score between the candidate case and the case to be retrieved may be obtained based on similarities between the second semantic representations of the several text types and the first semantic representation of the same text type, respectively, and at least one candidate case may be selected as the target case based on the second matching score. The specific process may refer to the related description in the following disclosed embodiments, and is not repeated herein. In the mode, for each candidate case, the second matching score between the candidate case and the case to be retrieved is obtained through the similarity between the second semantic representations of the plurality of text types and the first semantic representation of the same text type, and the target case is selected and obtained based on the second matching score, so that the matching degree between the candidate case and the case to be retrieved can be accurately described in a content angle, and the case retrieval accuracy can be improved.
In yet another implementation scenario, in a case where the first feature information relates to relevant french tips and case content of the candidate case, and the second feature information relates to relevant french tips and case content of the case to be retrieved, the first feature information may specifically include a first set of french tips related to the candidate case and a first semantic representation of several text types, and the second feature information may specifically include a second set of french tips related to the case to be retrieved and a second semantic representation of several text types. Reference may be made to the foregoing description for details, which are not repeated herein.
In a specific implementation scenario, please refer to fig. 2 in combination, and fig. 2 is a state diagram illustrating an embodiment of the case search method according to the present application. As shown in fig. 2, for each candidate case in the case base, a first matching score between the candidate case and the case to be retrieved may be obtained from a law matching perspective based on a coverage condition of the second law set on the first law set of the candidate case, and from the content matching perspective, obtaining a second matching score between the candidate case and the case to be retrieved based on the similarity between the second semantic representations of the plurality of text types and the first semantic representation of the same text type respectively, on the basis, the first matching score and the second matching score can be synthesized, candidate cases matched with the cases to be retrieved are screened from the case base to serve as target cases, specifically, the total matching score between the candidate cases and the cases to be retrieved can be obtained on the basis of the first matching score and the second matching score, and therefore at least one candidate case can be selected to serve as a target case on the basis of the total matching score. By the method, the matching degree between the candidate case and the case to be retrieved can be accurately described from the perspective of the law and the content, so that the matching degree between the target case obtained by retrieval and the case to be retrieved can be improved, and the accuracy of case retrieval can be further improved.
In another specific implementation scenario, the first weight and the second weight may be used to perform weighting processing on the first matching score and the second matching score, respectively, so as to obtain a total matching score. For convenience of description, a first matching score between the ith candidate case and the case to be retrieved can be recorded as
Figure BDA0003095799130000101
The second matching score between the ith candidate case and the case to be retrieved can be recorded as
Figure BDA0003095799130000102
The total matching Score between the ith candidate case and the case to be retrieved is ScoreiCan be expressed as:
Figure BDA0003095799130000103
in the above formula (1), λLawAnd λContentRespectively representing a first weight and a second weight. The numerical values of the first weight and the second weight can be set according to the actual application requirements. For example, in the case of paying more attention to the angle of the law, the first weight may be set to be greater than the second weight, for example, the first weight may be set to 0.7, and the second weight may be set to 0.3; or, in the case of paying attention to the content angle, the first weight may be set to be smaller than the second weight, for example, the first weight may be set to 0.3, and the second weight may be set to 0.7; or, in the case that the angle of the law bar and the angle of the content pay equal attention, the first weight may be set to be equal to the second weight, and for example, the first weight and the second weight may both be set to be 0.5, which is not limited herein.
According to the scheme, the case to be retrieved and the case base are obtained, the case base comprises a plurality of candidate cases and first characteristic information of the candidate cases, the first characteristic information relates to at least one of related legal rules of the candidate cases and case contents of the candidate cases, second characteristic information of the case to be retrieved is extracted, the second characteristic information relates to at least one of related legal rules of the case to be retrieved and case contents of the case to be retrieved, and on the basis, at least one candidate case is selected as a target case matched with the case to be retrieved based on the first characteristic information and the second characteristic information.
Referring to fig. 3, fig. 3 is a flowchart illustrating an embodiment of obtaining a set of laws. The method specifically comprises the following steps:
step S31: and performing sentence division processing on the case to obtain a plurality of first sub-texts.
As described in the foregoing disclosure, cases may be divided into sentences with periods, commas, and the like as boundaries. The detailed description may refer to the related description in the foregoing disclosed embodiments, and is not repeated herein. Taking the case "no-witness driving a motor vehicle in Zhao Wu and escaping after hit-and-run in the way to a certain place in the front" as an example, commas can be taken as boundaries to obtain the following first sub-text: "Zhao Wu driving motor vehicle without evidence", "and going to some place in the front" and "escape after accident". Other cases may be analogized, and no one example is given here.
Step S32: and respectively analyzing the plurality of first sub texts by using a law article analysis network to obtain the law article classification representation of the first sub texts.
In one implementation scenario, please refer to fig. 4 in combination, and fig. 4 is a schematic diagram of a framework of an embodiment of a law enforcement analysis network. As shown in FIG. 4, the French analysis network may include a semantic extraction sub-network, a full connectivity layer, and a normalization layer.
In a specific implementation scenario, as described in the foregoing disclosure, the semantic extraction sub-network may include, but is not limited to: BERT, RoBERTA, XLNET, etc., without limitation thereto. And performing semantic extraction on the first sub-text by using a semantic extraction sub-network to obtain a feature vector D containing the semantic information of the first sub-text.
In another specific implementation scenario, the full-link layer is used to perform linear transformation on the feature vector D, and scale the feature vector D to the same dimension as the total number of normal bars to obtain a new feature vector D'.
In yet another specific implementation scenario, the normalization function employed by the normalization layer may include, but is not limited to, a Sigmoid function, and the normalization layer is configured to convert the value of each element in the feature vector D' to between 0 and 1, and set the element to a first value (e.g., 1) if the converted value is not less than 0.5, and set the element to a second value (e.g., 0) if the converted value is less than 0.5.
It should be noted that, in the embodiment of the present disclosure, the french classification representation includes several elements arranged in sequence, and in the case that an element is a first numerical value (e.g., 1), the first sub-text is related to the french corresponding to the ordinal position of the element, and in the case that an element is a second numerical value (e.g., 0), the first sub-text is not related to the french corresponding to the ordinal position of the element.
With reference to fig. 4, taking the first sub-text "zhao wu driving without any license vehicle" as an example, after analyzing the network by law, we can obtain a law sentence classification "0001000000000", i.e., the first sub-text "zhao wu driving without license vehicle" is related to the law sentence corresponding to the 4 th element, and since the law sentence corresponding to the 4 th element is preset as the "nineteenth law of road traffic safety", we can determine that the first sub-text "zhao wu driving without license vehicle" is related to the law sentence "the nineteenth law of road traffic safety". Other cases may be analogized, and no one example is given here.
Step S33: and obtaining a French sentence set related to the case based on the French classification representation of the first sub texts.
Specifically, as described above, the french items related to the first sub-text may be obtained based on the french item classification representations of the several first sub-texts, and on this basis, the union of the french items related to the several first sub-texts may be used as the case-related french item set.
Referring to fig. 5, fig. 5 is a state diagram illustrating an embodiment of obtaining a case-related law statement set. As shown in fig. 5, the first sub-text of the aforementioned case "no-witness driving motor vehicle and having hit-and-run ahead on the way to a certain place" is also taken as an example: the ' Zhao Wu driving motor vehicle with no evidence ', ' and going to a certain place ' and ' escaping after hit and miss ' are respectively analyzed by the law clauses, so that the corresponding law clauses can be obtained, namely, the first sub-text ' Wu Zhao driving motor vehicle ' corresponds to ' nineteenth item of the road traffic safety method ', the first sub-text ' does not have the corresponding law clause ' and goes to a certain place ', and the first sub-text ' escaping after hit and miss ' corresponds to ' fifty th item of the infringing responsibility method ', so that the law clause set { the nineteenth item of the road traffic safety method, the fifty-th item of the infringing responsibility method } of the case can be obtained. Other cases may be analogized, and no one example is given here.
It should be noted that, in the case that the case is the candidate case, the law set is the first law set, and in the case that the case is the case to be retrieved, the law set is the second law set.
Different from the embodiment, the cases and the reverse clauses are processed to obtain the first sub texts, the law clause classification representation of the first sub texts is obtained by analyzing the first sub texts by utilizing the law clause analysis network, so that the law clause classification representation based on the first sub texts is obtained, the law clause set related to the cases is obtained, and the efficiency of obtaining the law clause set can be improved.
Referring to FIG. 6, FIG. 6 is a flow diagram illustrating an embodiment of obtaining semantic representations. The method specifically comprises the following steps:
step S61: and performing sentence division processing on the case to obtain a plurality of second sub-texts.
As described in the foregoing disclosure, cases may be divided into sentences with periods, commas, and the like as boundaries. The detailed description may refer to the related description in the foregoing disclosed embodiments, and is not repeated herein. Taking the case "XXXX year XX month XX day, Zhang san colliding down the new silk to cause serious injury, Zhang san is responsible" as an example, the following second sub-text can be obtained with comma as the boundary: "XXXX X year XX month XX day", "Zhang three hit the toy figure", "cause the toy figure to be seriously injured", and "Zhang three take the main responsibility". Other cases may be analogized, and no one example is given here.
Step S62: and screening the target sub-texts belonging to any one of the text types from the second sub-texts.
In an implementation scenario, as described in the foregoing disclosure, the text classification network may be used to classify the plurality of second sub-texts, respectively, to obtain text types to which the second sub-texts belong, or to determine that the second sub-texts do not belong to any text type. Specifically, as described above, the text classification network may output probability values that the second sub-text belongs to several text types and does not belong to any text type, so that the text type to which the second sub-text belongs may be determined based on the maximum probability value, or the second sub-text is determined not to belong to any text type. For convenience of description, in the case where the second sub-text does not belong to any text type, the second sub-text may be regarded as belonging to "irrelevant text".
In a specific implementation scenario, please refer to fig. 7 in combination, and fig. 7 is a schematic diagram of a framework of an embodiment of a text classification network. As shown in fig. 7, the text classification network may include: a semantic extraction subnetwork, a full connectivity layer, and a normalization layer. The semantic extraction sub-network may specifically include, but is not limited to: BERT, RoBERTA, XLNET, etc., without limitation thereto. And performing semantic extraction on the second sub-text by using a semantic extraction sub-network to obtain a feature vector E containing the semantic information of the second sub-text. The full-connection layer is used for carrying out linear transformation on the feature vector E and scaling the feature vector E to the dimension which is the same as the total number of types to obtain a new feature vector E'. It should be noted that the specific value of the total number of types is the total number of several text types plus 1. For example, in several text types including: in the case of accident details, accident consequences and responsibility division, the total number of types is 4, and the rest of the cases can be analogized, and the examples are not repeated. Further, the normalization function employed by the normalization layer may include, but is not limited to, a Softmax function for normalizing the norm of the feature vector E'.
In another specific implementation scenario, referring to fig. 7, and taking the above case as an example, after the second sub-text "zhang san knocks down the juan" is classified by the text classification network, the probability values of the juan belonging to the "irrelevant text", "accident details", "accident consequence" and "responsibility division" are respectively: 0.1, 0.8, 0.04 and 0.06, so that the text type "accident detail" corresponding to the maximum probability value of 0.8 can be used as the text type to which the second sub-text "three-for-one crashes over the xiao juan" belongs, and the second sub-text "three-for-one crashes over the xiao juan" can be used as the target sub-text. Other cases may be analogized, and no one example is given here.
Step S63: and extracting text semantic representation of the target sub-text.
In an implementation scenario, semantic extraction networks such as BERT, RoBERTa, XLNET, etc. may be used to extract text semantic representations of target sub-texts, and the specific process is not described herein again.
In a specific implementation scenario, still taking the case of "XXXX month XX day XX year by three years, three years old bump the new beautiful silk down to cause serious injury, and three years old have main responsibilities" as an example, through the classification processing of the text classification network, it can be finally determined that the second sub-text "XXXX month XX day XX month XX day by XXXX year" belongs to "irrelevant text", so it can be eliminated, and it is determined that the second sub-text "three years old bump the new beautiful silk down" belongs to "accident details", while the second sub-text "causes serious injury to the new beautiful silk" belongs to "accident consequence", and the second sub-text "three times old responsibility" belongs to "responsibility division", so that the three second sub-texts can be used as target sub-texts, and text semantic representations of the three target sub-texts are extracted respectively. Other cases may be analogized, and no one example is given here.
In another specific implementation scenario, in order to improve the accuracy of text semantic representation, the text semantic extraction network is pre-trained on a sentence pair matching task. Note that the Sentence Pair Matching (sequence Pair Matching) task refers to: the method comprises the steps of giving two texts, wherein a task target is to judge whether the two texts have a certain type of relationship, namely, a mapping function needs to be learned in the training process, inputting the two texts, outputting a certain type of labels in a task classification label set through mapping function transformation, and improving the capability of a semantic extraction network for accurately extracting text semantic information by pre-training the semantic extraction network on a sentence pair matching task.
Step S64: and obtaining the final semantic representation of the corresponding text type based on the text semantic representation of the target sub-texts belonging to the same text type.
Specifically, the text semantic representations of the target sub-texts belonging to the same text type may be averaged to obtain an average semantic representation of the corresponding text type, and the average semantic representation may be normalized to obtain a final semantic representation of the text type. In the mode, the text semantic representations of the target sub-texts belonging to the same text type are subjected to average processing to obtain the average semantic representation of the corresponding text type, and the average semantic representation is subjected to normalization processing to obtain the final semantic representation of the text type, so that the accuracy of the final text type can be improved.
In an implementation scenario, the text semantic representation may include several elements arranged in sequence, and then in the averaging process, the elements at the same ordinal position in the text semantic representations of the target sub-texts belonging to the same text type may be averaged to obtain an average semantic representation of the text type.
In another implementation scenario, during the normalization process, a modulus of the average semantic representation may be obtained, and each element in the average semantic representation is divided by the modulus to obtain a final semantic representation.
In another implementation scenario, after the text semantic representation of the target sub-text is extracted, a full-link layer can be used to perform vector dimension reduction on the text semantic representation to obtain a new text semantic representation, so as to save storage space and accelerate operation speed. On the basis, the final semantic representation of the corresponding text type can be obtained based on the new text semantic representation of the target sub-text belonging to the same text type.
In still another implementation scenario, for convenience of description, target sub-texts belonging to the same text type may be denoted as X, where N target sub-texts are includedTarget sub-text X1,X2,…,XNThen the final semantic representation of this text type, emb (x), can be expressed as:
Figure BDA0003095799130000151
in the above formula (2), XjRepresenting the jth target sub-text, BERT (X), among target sub-texts belonging to the same text typej) The method comprises the steps of representing the semantic representation of the text extracted by the jth target sub-text through a semantic extraction network BERT, representing Linear transformation by Linear, specifically being realized by a full connection layer, and representing normalization processing by Normalize.
It should be noted that, in the case that the case is a candidate case, the final semantic representation is a first semantic representation, and in the case that the case is a case to be retrieved, the final semantic representation is a second semantic representation.
Different from the embodiment, the cases are subjected to sentence division processing to obtain the second sub-texts, and the target sub-texts belonging to any one of the text types are screened from the second sub-texts, so that the text information related to the retrieval task in the cases can be automatically extracted from the cases, and the convenience of case retrieval is improved.
Referring to fig. 8, fig. 8 is a flowchart illustrating an embodiment of step S13 in fig. 1. Specifically, fig. 8 is a schematic flow chart of acquiring the target case when the first feature information relates to a relevant law of the candidate case and the second feature information relates to a relevant law of the case to be retrieved. The method specifically comprises the following steps:
step S81: for each candidate case: and obtaining a first matching score between the candidate case and the case to be retrieved based on the coverage condition of the second law set on the first law set of the candidate case.
Specifically, the coverage condition may include: namely, a first number of french items included in the first french item set as well as the second french item set, and a second number of french items not included in the second french item set but included in the first french item set. For ease of description, the first set of rules for a candidate case may represent SDocThe second set of rules for the case to be retrieved may be denoted SQueryThe first number and the second number may then be expressed as:
Num1=|SQuery∩SDoc|……(3)
Num2=|SDoc-SQuery|……(4)
num in the above-mentioned formula (3) and formula (4)1Denotes a first number, Num2Denotes a second number, SQuery∩SDocRepresenting the intersection of the first and second law sets, including the laws contained in both the first and second law sets, SDoc-SQueryRepresenting the french items contained in the first set of french items but not in the second set of french items. The first number Num in the case that the second normal bar set can completely cover the first normal bar set1Reaches the maximum (i.e. the total number of the French items contained in the first French item set), and the second number Num2The minimum (i.e. 0) is reached, and the candidate case and the case to be retrieved can be regarded as a complete match. Similarly, in the case where the second law set partially covers the first law, the candidate case and the case to be retrieved may be regarded as partial match. On the contrary, in the case that the second law set can not cover the first law set at all, the first number Num1Reaches a minimum (i.e., 0), and a second number Num2The maximum (i.e. the total number of the rules contained in the first rule set) is reached, and the candidate case and the case to be retrieved can be considered as a complete mismatch. Therefore, based on the following steps:
in one implementation scenario, the difference between the first number and the second number may be directly used as the first match score.
In another implementation scenario, to control the effect of the first number and the second number on the first match score, the difference between the first number and the product of the second number and the adjustment factor may be used as the first match score. For convenience of description, the adjustment coefficient may be recorded as γ, and then the first match Score is obtainedLawCan be expressed as:
ScoreLaw=|SQuery∩SDoc|-γ|SDoc-SQuery|……(5)
in the above formula (5), the adjustment coefficient γ can be set according to practical application. For example, in the case where the second amount is relatively focused, the adjustment coefficient γ may be set to be slightly larger, such as 0.5, 0.6, etc., and in the case where the second amount is not focused, the adjustment coefficient γ may be set to be slightly smaller, such as 0.2, 0.3, etc., and the specific value of the adjustment coefficient γ is not limited herein. In this way, the first match score may be reduced by the same score as the adjustment factor for each condition that one law is in the first law set but not in the second law set, thereby facilitating control of the effect of the first number and the second number on the first match score.
Step S82: based on the first match score, at least one candidate case is selected as a target case.
Specifically, the candidate cases in the case base may be sorted in the order from high to low according to the first matching score, and the candidate case located in the front preset order may be selected as the target case. For example, the candidate case located in the top 5 digits may be selected as the target case, which is not limited herein.
Different from the embodiment, for each candidate case, the first matching score between the candidate case and the case to be retrieved is obtained through the coverage condition of the second law set on the first law set of the candidate case, and the target case is selected and obtained based on the first matching score, so that the matching degree between the candidate case and the case to be retrieved can be accurately described in a law angle, and the accuracy of case retrieval can be improved.
Referring to fig. 9, fig. 9 is a schematic flowchart illustrating another embodiment of step S13 in fig. 1. Specifically, fig. 9 is a schematic flow chart of acquiring a target case when the first feature information relates to case content of the candidate case and the second feature information relates to case content of the case to be retrieved. The method specifically comprises the following steps:
step S91: for each candidate case: and obtaining a second matching score between the candidate case and the case to be retrieved based on the similarity between the second semantic representations of the plurality of text types and the first semantic representation of the same text type.
Specifically, for each candidate case, the similarity between the second semantic representation and the first semantic representation of the same text type may be obtained, and the similarity corresponding to the text type is weighted by using the preset weight value corresponding to the text type, so as to obtain a second matching score. In the above manner, for each candidate case, the similarity between the second semantic representation and the first semantic representation of the same text type is respectively obtained, and the similarity corresponding to the text type is weighted by using the preset weight value corresponding to the text type to obtain the second matching score, so that the second matching score can be obtained by synthesizing the dimensions of different text types, and the accuracy of the second matching score is improved.
In an implementation scenario, a dot product operation may be performed on the first semantic representation and the second semantic representation of the same text type to obtain a similarity between the two semantic representations.
In another implementation scenario, the preset weight corresponding to the text type may be set according to the importance of the text type, and the number of text types including accident details, accident consequences, and responsibilities are divided into examples. For example, under the condition that the accident details are relatively important, the preset weight corresponding to the accident details may be set slightly larger, for example, the preset weights corresponding to the accident details, the accident consequence and the responsibility plan may be respectively set as: 0.6, 0.2; or, under the condition that the accident consequence is more important, the preset weight corresponding to the accident consequence situation may be set slightly larger, for example, the preset weights corresponding to the accident details, the accident consequence and the responsibility plan may be respectively set as: 0.2, 0.6, 0.2; or, under the condition that the responsibility division is important, the preset weight corresponding to the responsibility division can be set slightly larger, for example, the accident details, the accident consequence and the preset weight corresponding to the responsibility division can be respectively set as: 0.2, 0.6, but not limited thereto.
In yet another implementation scenario, please refer to fig. 10, in which fig. 10 is a state diagram illustrating an embodiment of obtaining the second match score. As shown in fig. 10, taking the case to be retrieved as "XXXX month XX day XX year by three years, three people who stretch over the small silk to cause the heavy injury of the small silk and three people who have main responsibilities" as an example, it includes four second sub-texts "XXXX month XX day XX year", "three people who stretch over the small silk to cause the heavy injury of the small silk", "cause the heavy injury of the small silk" and "three people who have main responsibilities", after the four second sub-texts are classified by the text classification network, it can be determined that the second sub-text "XX month XX day XX year" belongs to the irrelevant text ", the second sub-text" three people who have collided over the small silk "belongs to the accident details, the second sub-text" causes the heavy injury of the small silk to belong to the accident consequence, the second sub-text "three people who have main responsibilities" belongs to the division, and the latter three second sub-texts are taken as the target sub-texts, on the basis, the semantic extraction network is used to extract the target sub-texts, and the text expressions of the three target sub-texts respectively, and obtaining a second semantic representation of accident details and a second semantic representation of responsibility division based on the text semantic representations of the target sub-texts belonging to the same text type. On the basis, for each candidate case, the similarity of the accident details, the similarity of the accident consequences and the similarity of the responsibility division are weighted respectively by using the first semantic representation of the accident details and the second semantic representation of the accident details, the similarity of the responsibility division and the second semantic representation of the responsibility division, and a second matching score is obtained.
In another implementation scenario, for convenience of description, still taking several text types including accident details, accident consequences and responsibilities as examples, a preset weight corresponding to the ith text type may be recorded as wiThen the second match ScoreContentCan be expressed as:
Figure BDA0003095799130000191
in the above formula (6), Emb (Doc)i) A first semantic representation, Emb (Query), representing the ith text typei) A second semantic representation representing the ith text type. Represents a dot product operation. In the case that several text types are other, the analogy can be done, and there is no further example here.
Step S92: based on the second match score, at least one candidate case is selected as a target case.
Specifically, the candidate cases in the case base may be sorted in the order from high to low according to the second matching score, and the candidate case located in the front preset order may be selected as the target case. For example, the candidate case located in the top 5 digits may be selected as the target case, which is not limited herein.
Different from the embodiment, for each candidate case, the second matching score between the candidate case and the case to be retrieved is obtained through the similarity between the second semantic representations of the plurality of text types and the first semantic representation of the same text type, and the target case is selected and obtained based on the second matching score, so that the matching degree between the candidate case and the case to be retrieved can be accurately described in a content angle, and the accuracy of case retrieval can be improved.
Referring to fig. 11, fig. 11 is a schematic diagram of a frame of an electronic device 110 according to an embodiment of the present application. The electronic device 110 includes a memory 111 and a processor 112 coupled to each other, the memory 111 stores program instructions, and the processor 112 is configured to execute the program instructions to implement the steps in any of the case search method embodiments described above. Specifically, electronic device 110 may include, but is not limited to: desktop computers, notebook computers, servers, mobile phones, tablet computers, and the like, without limitation.
In particular, the processor 112 is configured to control itself and the memory 111 to implement the steps in any of the case retrieval method embodiments described above. Processor 112 may also be referred to as a CPU (Central Processing Unit). The processor 112 may be an integrated circuit chip having signal processing capabilities. The Processor 112 may also be a general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. In addition, the processor 112 may be commonly implemented by integrated circuit chips.
According to the scheme, the case to be retrieved and the case base are obtained, the case base comprises a plurality of candidate cases and first characteristic information of the candidate cases, the first characteristic information relates to at least one of related legal rules of the candidate cases and case contents of the candidate cases, second characteristic information of the case to be retrieved is extracted, the second characteristic information relates to at least one of related legal rules of the case to be retrieved and case contents of the case to be retrieved, and on the basis, at least one candidate case is selected as a target case matched with the case to be retrieved based on the first characteristic information and the second characteristic information.
Referring to fig. 12, fig. 12 is a schematic diagram of a framework of an embodiment of a case retrieval device 120 according to the present application. The case retrieval device 120 comprises a case acquisition module 121, a feature extraction module 122 and a case selection module 123, wherein the case acquisition module 121 is used for acquiring a case to be retrieved and a case base; the case base comprises a plurality of candidate cases and first characteristic information of the candidate cases, wherein the first characteristic information relates to at least one of relevant laws of the candidate cases and case contents of the candidate cases; the feature extraction module 122 is configured to extract second feature information of the case to be retrieved; the second characteristic information relates to at least one of relevant legal rules of the case to be retrieved and case content of the case to be retrieved; the case selection module 123 is configured to select at least one candidate case as a target case matched with the case to be retrieved based on the first feature information and the second feature information.
According to the scheme, the case to be retrieved and the case base are obtained, the case base comprises a plurality of candidate cases and first characteristic information of the candidate cases, the first characteristic information relates to at least one of related legal rules of the candidate cases and case contents of the candidate cases, second characteristic information of the case to be retrieved is extracted, the second characteristic information relates to at least one of related legal rules of the case to be retrieved and case contents of the case to be retrieved, and on the basis, at least one candidate case is selected as a target case matched with the case to be retrieved based on the first characteristic information and the second characteristic information.
In some disclosed embodiments, in a case where the first feature information relates to a relevant law for the candidate case and the second feature information relates to a relevant law for the case to be retrieved, the first feature information includes a first set of laws related to the candidate case and the second feature information includes a second set of laws related to the case to be retrieved.
Different from the embodiment, under the condition that the first characteristic information relates to the relevant laws of the candidate cases and the second characteristic information relates to the relevant laws of the cases to be retrieved, the first characteristic information is set to comprise a first law set relevant to the candidate cases, and the second characteristic information is set to comprise a second law set relevant to the cases to be retrieved, so that the target cases can be selected from a plurality of candidate cases based on the first law set and the second law set, the cases can be modeled from the law angle, and the accuracy of case retrieval is improved.
In some disclosed embodiments, the case selection module 123 includes a french matching sub-module for, for each candidate case: obtaining a first matching score between the candidate case and the case to be retrieved based on the coverage condition of the second law set on the first law set of the candidate case; the case selection module 123 comprises a first selection sub-module for selecting at least one candidate case as a target case based on the first match score.
Different from the embodiment, for each candidate case, the first matching score between the candidate case and the case to be retrieved is obtained through the coverage condition of the second law set on the first law set of the candidate case, and the target case is selected and obtained based on the first matching score, so that the matching degree between the candidate case and the case to be retrieved can be accurately described in a law angle, and the accuracy of case retrieval can be improved.
In some disclosed embodiments, the coverage condition comprises: the system comprises a first number of the French items contained in the first French item set and the second French item contained in the second French item set, and a second number of the French items not contained in the second French item set but contained in the first French item set, wherein the French item matching sub-module is specifically used for taking the difference between the first number and the product of the second number and the adjusting coefficient as a first matching score.
In contrast to the foregoing embodiment, the first match score may be reduced by the same score as the adjustment factor for each condition where a law is in the first set of laws and not in the second set of laws, thereby facilitating control of the effect of the first number and the second number on the first match score.
In some disclosed embodiments, the case retrieving apparatus 120 further includes a set obtaining module 124, where the set obtaining module 124 includes a first sentence dividing module, configured to perform sentence dividing processing on the case to obtain a plurality of first sub-texts; the set obtaining module 124 includes a french analysis submodule, configured to analyze a plurality of first sub-texts by using a french analysis network, respectively, to obtain a french classification representation of the first sub-texts; the French classification representation comprises a plurality of elements which are sequentially arranged, and under the condition that the elements are first numerical values, the first sub text is related to the French corresponding to the sequence positions of the elements, and under the condition that the elements are second numerical values, the first sub text is not related to the French corresponding to the sequence positions of the elements; the set acquiring module 124 comprises a set acquiring submodule, configured to obtain a set of french items related to the case based on the french item classification representations of the first sub-texts; the rule set is a first rule set under the condition that the cases are the candidate cases, and the rule set is a second rule set under the condition that the cases are the cases to be retrieved.
Different from the embodiment, the cases and the reverse clauses are processed to obtain the first sub texts, the law clause classification representation of the first sub texts is obtained by analyzing the first sub texts by utilizing the law clause analysis network, so that the law clause classification representation based on the first sub texts is obtained, the law clause set related to the cases is obtained, and the efficiency of obtaining the law clause set can be improved.
In some disclosed embodiments, in a case where the first feature information relates to case content of the candidate case and the second feature information relates to case content of the case to be retrieved, the first feature information comprises first semantic representations of the number of text types and the second feature information comprises second semantic representations of the number of text types; the first semantic representations of the text types are obtained respectively based on the subfiles belonging to the text types in the candidate case, and the second semantic representations of the text types are obtained respectively based on the subfiles belonging to the text types in the case to be retrieved.
Different from the embodiment, under the condition that the first characteristic information relates to case content of the candidate case and the second characteristic information relates to case content of the case to be retrieved, the first characteristic information is set to be a first semantic representation comprising a plurality of text types and the second characteristic information is set to be a second semantic representation comprising a plurality of text types, so that the target case can be selected from a plurality of candidate cases based on the first semantic representation and the second semantic representation, the case can be modeled from the content perspective, and compared with the case directly retrieved through a character string, the case retrieval accuracy can be improved.
In some disclosed embodiments, the case selection module 123 further includes a content matching sub-module for, for each candidate case: obtaining second matching scores between the candidate cases and the cases to be retrieved based on the similarity between the second semantic representations of the text types and the first semantic representations of the same text types; the case selection module 123 further comprises a second selection submodule for selecting at least one candidate case as a target case based on the second match score.
Different from the embodiment, for each candidate case, the second matching score between the candidate case and the case to be retrieved is obtained through the similarity between the second semantic representations of the plurality of text types and the first semantic representation of the same text type, and the target case is selected and obtained based on the second matching score, so that the matching degree between the candidate case and the case to be retrieved can be accurately described in a content angle, and the accuracy of case retrieval can be improved.
In some disclosed embodiments, the content matching sub-module includes a similarity calculation unit for obtaining a similarity between the second semantic representation and the first semantic representation of the same text type, respectively; the content matching sub-module comprises a weighting processing unit which is used for weighting the similarity corresponding to the text type by using the preset weight corresponding to the text type to obtain a second matching score.
Different from the embodiment, for each candidate case, the similarity between the second semantic representation and the first semantic representation of the same text type is respectively obtained, and the similarity corresponding to the text type is weighted by using the preset weight value corresponding to the text type to obtain the second matching score, so that the dimensions of different text types can be integrated to obtain the second matching score, and the accuracy of the second matching score is improved.
In some disclosed embodiments, the case retrieving apparatus 120 further includes a representation obtaining module 125, where the representation obtaining module 125 includes a second sentence dividing module, configured to perform sentence dividing processing on the case to obtain a plurality of second sub-texts, and filter a target sub-text belonging to any one of a plurality of text types from the plurality of second sub-texts; the representation obtaining module 125 includes a representation extracting sub-module, configured to extract text semantic representations of the target sub-texts, and obtain final semantic representations of corresponding text types based on text semantic representations of the target sub-texts belonging to the same text type; and the final semantic representation is a first semantic representation under the condition that the case is a candidate case, and the final semantic representation is a second semantic representation under the condition that the case is a case to be retrieved.
Different from the embodiment, the cases are subjected to sentence division processing to obtain the second sub-texts, and the target sub-texts belonging to any one of the text types are screened from the second sub-texts, so that the text information related to the retrieval task in the cases can be automatically extracted from the cases, and the convenience of case retrieval is improved.
In some disclosed embodiments, the representation extraction sub-module includes an average processing unit, configured to average text semantic representations of target sub-texts belonging to the same text type to obtain an average semantic representation of a corresponding text type; the expression extraction submodule comprises a normalization processing unit which is used for carrying out normalization processing on the average semantic expression to obtain the final semantic expression.
Different from the embodiment, the text semantic representations of the target sub-texts belonging to the same text type are subjected to average processing to obtain the average semantic representation of the corresponding text type, and the average semantic representation is subjected to normalization processing to obtain the final semantic representation of the text type, so that the accuracy of the final text type can be improved.
Referring to fig. 13, fig. 13 is a schematic diagram of a memory device 130 according to an embodiment of the present application. The storage device 130 stores program instructions 131 capable of being executed by the processor, the program instructions 131 being used to implement the steps in any of the case retrieval method embodiments described above.
According to the scheme, the accuracy of case retrieval can be improved.
In some embodiments, functions of or modules included in the apparatus provided in the embodiments of the present disclosure may be used to execute the method described in the above method embodiments, and specific implementation thereof may refer to the description of the above method embodiments, and for brevity, will not be described again here.
The foregoing description of the various embodiments is intended to highlight various differences between the embodiments, and the same or similar parts may be referred to each other, and for brevity, will not be described again herein.
In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a module or a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some interfaces, and may be in an electrical, mechanical or other form.
Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Claims (13)

1. A case retrieval method, comprising:
acquiring a case to be retrieved and a case base; the case base comprises a plurality of candidate cases and first characteristic information of the candidate cases, wherein the first characteristic information relates to at least one of relevant laws of the candidate cases and case contents of the candidate cases;
extracting second characteristic information of the case to be retrieved; wherein the second characteristic information relates to at least one of a related law of the case to be retrieved and case content of the case to be retrieved;
and selecting at least one candidate case as a target case matched with the case to be retrieved based on the first characteristic information and the second characteristic information.
2. The method of claim 1, wherein in a case that the first feature information relates to a relevant law of the candidate case and the second feature information relates to a relevant law of the case to be retrieved, the first feature information comprises a first set of laws related to the candidate case, and the second feature information comprises a second set of laws related to the case to be retrieved.
3. The method of claim 2, wherein the selecting at least one of the candidate cases as the target case matching the case to be retrieved based on the first feature information and the second feature information comprises:
for each of the candidate cases: obtaining a first matching score between the candidate case and the case to be retrieved based on the coverage condition of the second law set on the first law set of the candidate case;
selecting at least one of the candidate cases as the target case based on the first match score.
4. The method of claim 3, wherein the coverage condition comprises: a first number of french rules included in both the second set of french rules and the first set of french rules, and a second number of french rules not included in the second set of french rules but included in the first set of french rules; obtaining a first matching score between the candidate case and the case to be retrieved based on the coverage condition of the second law set on the first law set of the candidate case, including:
and taking the difference between the first quantity and the product of the second quantity and the adjusting coefficient as the first matching score.
5. The method of claim 2, wherein the step of obtaining the first set of laws or the step of obtaining the second set of laws comprises:
the case is subjected to sentence division processing to obtain a plurality of first sub-texts;
respectively analyzing the plurality of first sub texts by using a law enforcement analysis network to obtain the law enforcement classification representation of the first sub texts; the French classification representation comprises a plurality of elements which are sequentially arranged, and under the condition that the elements are first numerical values, the first sub text is related to the French corresponding to the ordinal position of the elements, and under the condition that the elements are second numerical values, the first sub text is not related to the French corresponding to the ordinal position of the elements;
obtaining a French sentence set related to the case based on the French classification representation of the plurality of first sub texts;
and if the case is the case to be retrieved, the law set is the second law set.
6. The method according to claim 1, wherein in case that the first feature information relates to case content of the candidate case and the second feature information relates to case content of the case to be retrieved, the first feature information comprises first semantic representations of several text types and the second feature information comprises second semantic representations of the several text types;
the first semantic representations of the text types are obtained respectively based on the sub-texts belonging to the text types in the candidate case, and the second semantic representations of the text types are obtained respectively based on the sub-texts belonging to the text types in the case to be retrieved.
7. The method of claim 6, wherein the selecting at least one of the candidate cases as the target case matching the case to be retrieved based on the first feature information and the second feature information comprises:
for each of the candidate cases: obtaining second matching scores between the candidate cases and the cases to be retrieved based on the similarity between the second semantic representations of the text types and the first semantic representation of the same text type;
selecting at least one of the candidate cases as the target case based on the second match score.
8. The method according to claim 7, wherein obtaining a second matching score between the candidate case and the case to be retrieved based on similarities between the second semantic representations of the text types and the first semantic representation of the same text type, respectively, comprises:
respectively acquiring the similarity between the second semantic representation and the first semantic representation of the same text type;
and carrying out weighting processing on the similarity corresponding to the text type by using a preset weight corresponding to the text type to obtain the second matching score.
9. The method of claim 6, wherein the step of obtaining a first semantic representation of the plurality of text types or the step of obtaining a second semantic representation of the plurality of text types comprises:
the case is subjected to sentence dividing processing to obtain a plurality of second sub texts, and target sub texts belonging to any one of the text types are screened from the second sub texts;
extracting text semantic representation of the target sub-text, and obtaining final semantic representation corresponding to the text type based on the text semantic representation of the target sub-text belonging to the same text type;
wherein, when the case is the candidate case, the final semantic representation is the first semantic representation, and when the case is the case to be retrieved, the final semantic representation is the second semantic representation.
10. The method according to claim 9, wherein the deriving a final semantic representation corresponding to the text type based on the text semantic representations of the target sub-texts belonging to the same text type comprises:
carrying out average processing on the text semantic representations of the target sub-texts belonging to the same text type to obtain an average semantic representation corresponding to the text type;
and carrying out normalization processing on the average semantic representation to obtain the final semantic representation.
11. A case retrieval apparatus, comprising:
the case acquisition module is used for acquiring a case to be retrieved and a case base; the case base comprises a plurality of candidate cases and first characteristic information of the candidate cases, wherein the first characteristic information relates to at least one of relevant laws of the candidate cases and case contents of the candidate cases;
the characteristic extraction module is used for extracting second characteristic information of the case to be retrieved; wherein the second characteristic information relates to at least one of a related law of the case to be retrieved and case content of the case to be retrieved;
and the case selection module is used for selecting at least one candidate case as a target case matched with the case to be retrieved based on the first characteristic information and the second characteristic information.
12. An electronic device, comprising a memory and a processor coupled to each other, wherein the memory stores program instructions, and the processor is configured to execute the program instructions to implement the case retrieval method of any one of claims 1 to 10.
13. A storage device storing program instructions executable by a processor for implementing the case retrieval method of any one of claims 1 to 10.
CN202110610809.6A 2021-06-01 2021-06-01 Case retrieval method and device, electronic equipment and storage device Active CN113535933B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110610809.6A CN113535933B (en) 2021-06-01 2021-06-01 Case retrieval method and device, electronic equipment and storage device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110610809.6A CN113535933B (en) 2021-06-01 2021-06-01 Case retrieval method and device, electronic equipment and storage device

Publications (2)

Publication Number Publication Date
CN113535933A true CN113535933A (en) 2021-10-22
CN113535933B CN113535933B (en) 2023-07-25

Family

ID=78094953

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110610809.6A Active CN113535933B (en) 2021-06-01 2021-06-01 Case retrieval method and device, electronic equipment and storage device

Country Status (1)

Country Link
CN (1) CN113535933B (en)

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060041607A1 (en) * 2004-08-23 2006-02-23 Miller David J Point of law search system and method
CN106502996A (en) * 2016-12-13 2017-03-15 深圳爱拼信息科技有限公司 A kind of judgement document's search method and server based on semantic matches
CN107908525A (en) * 2017-10-13 2018-04-13 深圳前海微众银行股份有限公司 Alert processing method, equipment and readable storage medium storing program for executing
CN108595547A (en) * 2018-04-09 2018-09-28 南京网感至察信息科技有限公司 A kind of similar case search method based on semantics extraction
CN109408520A (en) * 2018-09-26 2019-03-01 青岛农业大学 A kind of law online updating method, system, equipment and computer program product
CN109739966A (en) * 2018-12-29 2019-05-10 重庆木舌科技有限公司 The method for carrying out legal advice service using case library
CN110019429A (en) * 2017-12-01 2019-07-16 上海百事通信息技术股份有限公司 Legal services system, method, server, equipment and medium
CN110532456A (en) * 2019-06-14 2019-12-03 平安科技(深圳)有限公司 Case querying method, device, computer equipment and storage medium
CN110928994A (en) * 2019-11-28 2020-03-27 北京华宇元典信息服务有限公司 Similar case retrieval method, similar case retrieval device and electronic equipment
CN111125319A (en) * 2019-12-30 2020-05-08 重庆木舌科技有限公司 Enterprise basic law intelligent consultation terminal, system and method
CN111191455A (en) * 2018-10-26 2020-05-22 南京大学 Legal provision prediction method in traffic accident damage compensation
CN111538830A (en) * 2020-04-28 2020-08-14 清华大学 French retrieval method, French retrieval device, computer equipment and storage medium
CN112001162A (en) * 2020-07-31 2020-11-27 银江股份有限公司 Intelligent judging system based on small sample learning

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060041607A1 (en) * 2004-08-23 2006-02-23 Miller David J Point of law search system and method
CN106502996A (en) * 2016-12-13 2017-03-15 深圳爱拼信息科技有限公司 A kind of judgement document's search method and server based on semantic matches
CN107908525A (en) * 2017-10-13 2018-04-13 深圳前海微众银行股份有限公司 Alert processing method, equipment and readable storage medium storing program for executing
CN110019429A (en) * 2017-12-01 2019-07-16 上海百事通信息技术股份有限公司 Legal services system, method, server, equipment and medium
CN108595547A (en) * 2018-04-09 2018-09-28 南京网感至察信息科技有限公司 A kind of similar case search method based on semantics extraction
CN109408520A (en) * 2018-09-26 2019-03-01 青岛农业大学 A kind of law online updating method, system, equipment and computer program product
CN111191455A (en) * 2018-10-26 2020-05-22 南京大学 Legal provision prediction method in traffic accident damage compensation
CN109739966A (en) * 2018-12-29 2019-05-10 重庆木舌科技有限公司 The method for carrying out legal advice service using case library
CN110532456A (en) * 2019-06-14 2019-12-03 平安科技(深圳)有限公司 Case querying method, device, computer equipment and storage medium
CN110928994A (en) * 2019-11-28 2020-03-27 北京华宇元典信息服务有限公司 Similar case retrieval method, similar case retrieval device and electronic equipment
CN111125319A (en) * 2019-12-30 2020-05-08 重庆木舌科技有限公司 Enterprise basic law intelligent consultation terminal, system and method
CN111538830A (en) * 2020-04-28 2020-08-14 清华大学 French retrieval method, French retrieval device, computer equipment and storage medium
CN112001162A (en) * 2020-07-31 2020-11-27 银江股份有限公司 Intelligent judging system based on small sample learning

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
MATTHEW C. STAMM; K.J. RAY LIU;: "Forensic detection of image manipulation using statistical intrinsic fingerprints", IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, pages 492 - 506 *
刘澄;夏新恩;: "案例推理在法律咨询***中的应用研究", 韶关学院学报, no. 06, pages 33 - 35 *
马源;任瑞平;: "大数据与司法统一适用问题研究", 法制与经济, pages 172 - 173 *

Also Published As

Publication number Publication date
CN113535933B (en) 2023-07-25

Similar Documents

Publication Publication Date Title
CN110008311B (en) Product information safety risk monitoring method based on semantic analysis
Sharif et al. Sentiment analysis of Bengali texts on online restaurant reviews using multinomial Naïve Bayes
US8983963B2 (en) Techniques for comparing and clustering documents
CN110196901A (en) Construction method, device, computer equipment and the storage medium of conversational system
US20170316066A1 (en) Concept-based analysis of structured and unstructured data using concept inheritance
US20220138572A1 (en) Systems and Methods for the Automatic Classification of Documents
KR100420096B1 (en) Automatic Text Categorization Method Based on Unsupervised Learning, Using Keywords of Each Category and Measurement of the Similarity between Sentences
Luo et al. Evaluation of two systems on multi-class multi-label document classification
US20150100308A1 (en) Automated Formation of Specialized Dictionaries
KR20060045786A (en) Verifying relevance between keywords and web site contents
US20220180317A1 (en) Linguistic analysis of seed documents and peer groups
KR20160149050A (en) Apparatus and method for selecting a pure play company by using text mining
CN111241290B (en) Comment tag generation method and device and computing equipment
CN113722492A (en) Intention identification method and device
CN112632964B (en) NLP-based industry policy information processing method, device, equipment and medium
CN110347805A (en) Petroleum industry security risk key element extracting method, device, server and storage medium
CN113591476A (en) Data label recommendation method based on machine learning
CN113535933A (en) Case retrieval method and device, electronic equipment and storage device
CN110688559A (en) Retrieval method and device
Purba et al. A hybrid convolutional long short-term memory (CNN-LSTM) based natural language processing (NLP) model for sentiment analysis of customer product reviews in Bangla
Park et al. Automatic query-based personalized summarization that uses pseudo relevance feedback with nmf
Hirsch et al. Evolving Lucene search queries for text classification
Mason et al. An n-gram based approach to multi-labeled web page genre classification
CN110851560B (en) Information retrieval method, device and equipment
Dibitso et al. An hybrid part of speech tagger for Setswana language using a voting method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant