CN116402644A

CN116402644A - Legal supervision method and system based on big data multi-source data fusion analysis

Info

Publication number: CN116402644A
Application number: CN202310140572.9A
Authority: CN
Inventors: 贾俊亮; 谢玉军; 杨凯
Original assignee: Henan Jinmingyuan Information Technology Co ltd
Current assignee: Henan Jinmingyuan Information Technology Co ltd
Priority date: 2023-02-21
Filing date: 2023-02-21
Publication date: 2023-07-07

Abstract

The application provides a legal supervision method and a legal supervision system based on big data multi-source data fusion analysis, wherein the method comprises the following steps: carrying out directional acquisition on big data multi-source data by adopting a self-adaptive data acquisition model to generate an initial data set; the initial data set is processed by an intermediate database to obtain a target data set; performing spatial clustering on the obtained target data set to generate a classified target data set; aggregating the classification target data sets by using a sequence pattern mining analysis method to realize information fusion of the classification target data sets and obtain a fusion data set; taking the fusion data set as the input of a case clue research and judgment index model, acquiring a parameter set influencing case clue generation, and establishing a mapping relation between a case output index result and a case early warning level; the case with the case early warning level exceeding the threshold value is pushed to related departments, legal supervision on the case is realized, automatic discovery of case source information is realized, and case handling efficiency is improved.

Description

Legal supervision method and system based on big data multi-source data fusion analysis

Technical Field

The application relates to the technical field of multi-source data fusion analysis, in particular to a legal supervision method and a legal supervision system based on big data multi-source data fusion analysis.

Background

The multi-source data fusion technology is a technology for integrating all information obtained by investigation and analysis by utilizing a related means, uniformly evaluating the information and finally obtaining uniform information. The aim of the technology is to integrate various different data information, absorb the characteristics of different data sources, and then extract unified information better and richer than single data from the characteristics. The technique is explained mathematically as "1+1=3".

In the face of various case types, related departments cannot acquire case information in time, so that the cases cannot be processed in time; or the department who should acquire the case information misses the best time for acquiring the case information and misses the best time for processing the case, so that certain loss is caused. How to improve the cue discovery capability and the case handling quality effect by technical means becomes an unavoidable technical bottleneck in the case handling process of related departments, and related theoretical research and technical breakthrough are needed to be carried out.

Therefore, how to overcome the above-mentioned technical problems and drawbacks becomes a major problem to be solved.

Disclosure of Invention

In order to solve the problems of difficult discovery of case source clues and low efficiency of manual case handling in the prior art, the application provides a legal supervision method and device for multi-source data fusion analysis based on big data, which adopts the following technical scheme:

in a first aspect, the present application provides a legal supervision method based on big data multisource data fusion analysis, comprising:

s1, directionally acquiring big data multi-source data by adopting a self-adaptive data acquisition model to generate an initial data set;

s2, the initial data set is processed by an intermediate database to obtain a target data set;

s3, performing spatial clustering on the obtained target data set based on the characteristics of different classifications, and generating a classified target data set;

s4, performing aggregation analysis on the information of different elements of the classification target data set by using a sequence pattern mining analysis method to realize information fusion of the classification target data set and obtain a fusion data set;

s5, taking the fusion data set as input of a case clue research model, acquiring a parameter set influencing case clue formation, and establishing a mapping relation between a case output result and a case early warning level;

and S6, pushing the cases with the case early warning levels exceeding the threshold to related departments, and realizing legal supervision of the cases.

Further, in the step S1, the adaptive data acquisition model is used to perform directional acquisition on the multi-source data of the big data, and the construction of the adaptive data acquisition model is performed by adopting a mode of transfer learning, which is specifically expressed as follows:

training an initial data acquisition model using training data;

and in the application stage, the initial data acquisition model is adjusted by using the data related to the additional tasks, so that the adaptive data acquisition model is adapted to the target tasks/target data.

Further, the data collected in the step S1 in a directional manner is case source data.

Further, the step S1 of directionally collecting the big data multisource data by using the adaptive data acquisition model further includes: and identifying and extracting case source information from two dimensions of the field and the time based on semantic features, and realizing the directional acquisition of the case source information.

Further, the processing of the initial data set in the step S2 through an intermediate database includes:

(1) Based on the custom filtering rules aiming at url, filtering the external province-containing division, filtering the monitoring words, filtering the excluding words, filtering the parts of speech, filtering the content, filtering the field, removing the weight of url, and custom setting the monitoring words/the excluding words;

(2) The data exchange of the case source is realized by setting a custom table and a custom library and exchanging case source data among different databases;

(3) By configuring service data of different platforms, case source data exchange among different platforms is performed, and cross-platform case source data exchange is realized.

The processing of the initial data set through the intermediate database further comprises:

and cleaning, converting, associating and identifying the obtained initial data set, and realizing data verification, data splitting/merging, sorting, duplicate removal, filtering, desensitizing privacy and storage process calling to finish the processing of the initial data set.

Further, the spatially clustering the acquired target data set in the step S3 based on characteristics of different classifications includes:

based on similarity analysis of the acquired case contents in the target data set, a similarity analysis formula is as follows:

wherein C is ₁ ，C ₂ Respectively represent case contents C ₁ And case content C ₂ ，sim(C ₁ +C ₂ ) Representing the case content C ₁ And the case content C ₂ Similarity between Dis (C) ₁ +C ₂ ) Representing the case content C ₁ And the case content C ₂ The distance sigma between the two is an adjustable parameter;

when the case content C ₁ And the case content C ₂ When the semanteme is the same, the similarity value is 1, and when the case content C ₁ And the case content C ₂ When the semantics are completely different, their similarity value is close to 0.

Further, in the step S4, the step of performing information aggregation on the case threads in the classified case information base by using the sequence pattern mining analysis method to implement the case thread information fusion includes:

and carrying out sequence pattern mining analysis on regions, time, case categories, case programs, damage results and illegal types obtained from the classification database, and carrying out data aggregation on data information belonging to the same case clues under different time dimensions in the classification database according to the regions, time, case categories, case programs, damage results and illegal types, so as to realize deep fusion of data.

Further, the inputting of the fused dataset as the case cue research index model in the step S5 includes:

the case clue research model comprises obtaining research data values of corresponding elements according to regions, time, case categories, case programs, damage results and illegal types described in each case clue, calculating the research data result values of the case clues based on the weight values of different elements, and taking the research data result values of the case clues as the case output results;

inputting the region, time, case category, case program, damage result and illegal type in each case clue in the fusion data set into a case clue research model, and obtaining a research data value set of the region, time, case category, case program, damage result and illegal type in each case clue, wherein the research data value set is (x ₁ ,x ₂ ,x ₃ ,x ₄ ,x ₅ ,x ₆ ) Taking the research and judgment data value set as the parameter set for influencing the case clue proposal, combining the weight values distributed by different elements, outputting the result data of the case clue, and obtaining the formula of the case clue research and judgment data set result as follows:

Y＝α ₁ x ₁ +α ₂ x ₂ +α ₃ x ₃ +α ₄ x ₄ +α ₅ x ₅ +α ₆ x ₆

wherein Y is the result data value of case clue research and judgment, alpha ₁ Alpha is the weight value of the region in the case clue ₂ Alpha is the weight value of the time in the case clue ₃ Alpha is the weight value of the case category in the case clue ₄ For the weight value of the case program in the case clue, alpha ₅ Weight, alpha, of damage outcome in the case thread ₆ A weight value of the illegal type in the case clue is given; x is x ₁ For the data value x of the region in the case clue ₂ A data value x is determined for the time in the case clue ₃ The data value x is the research and judgment of the case category in the case clue ₄ For the data value x of the time in the case program ₅ Determining data value x for damage result in the case clue ₆ And researching and judging data values for the illegal types in the case clues.

Further, in the step S5, a mapping relationship between the case output result and the case early warning level is established, where the mapping relationship is that the higher the case output result is, the higher the case early warning level is.

In a second aspect, the present application also provides a legal supervision system based on big data multisource data fusion analysis, comprising: the system comprises an initial data set acquisition module, an initial data set processing module, a target data set classification module, a classified target data set fusion module, a classified target data set processing module and a law supervision module;

the initial data set acquisition module is used for directionally acquiring the big data multi-source data by adopting the self-adaptive data acquisition model to generate an initial data set;

the initial data set processing module is used for processing the initial data set through an intermediate database to obtain a target data set;

the target data set classification module is used for carrying out spatial clustering on the acquired target data set based on the characteristics of different classifications to generate a classified target data set;

the classification target data set fusion module is used for carrying out aggregation analysis on the information of different elements of the classification target data set by utilizing a sequence pattern mining analysis method so as to realize information fusion of the classification target data set and acquire a fusion data set;

the classification target data set processing module is used for taking the fusion data set as the input of a case clue research and judgment index model, acquiring a parameter set influencing case clue diagramming, and establishing a mapping relation between a case output index result and a case early warning level;

the legal supervision module is used for pushing the cases with the case early warning level exceeding a threshold value to related departments to realize legal supervision of the cases.

In a third aspect, the present application provides an electronic device, including:

one or more processors; a memory; and one or more computer programs, wherein the one or more computer programs are stored in the memory, the one or more computer programs comprising instructions, which when executed by the device, cause the device to perform the method of the first aspect.

In a fourth aspect, the present application provides a computer readable storage medium having a computer program stored therein, which when run on a computer causes the computer to perform the method according to the first aspect.

In a fifth aspect, the present application provides a computer program for performing the method of the first aspect when the computer program is executed by a computer.

In one possible design, the program in the fifth aspect may be stored in whole or in part on a storage medium packaged with the processor, or in part or in whole on a memory not packaged with the processor.

Compared with the prior art, the embodiment of the application has the following main beneficial effects:

1. the adaptive data acquisition model provided by the application directionally acquires the source information of the multi-source data case, and can generalize the source model to a target task by using a small amount of adaptive data.

2. The self-adaptive data acquisition model provided by the application directionally acquires the multi-source data case source information, can realize intelligent case source information discovery, and improves the case clue discovery capability.

3. According to the case clue information fusion method, the case clue information fusion is carried out through information aggregation analysis by using the sequence pattern mining analysis method, and the understanding capability of case clues is improved.

4. According to the case early warning method and device, the case with the case early warning level exceeding the threshold value is pushed to the related departments, so that the time for exploring the case is saved for the related departments, and the case handling efficiency is improved.

Drawings

FIG. 1 is an exemplary system architecture diagram in which embodiments of the present application may be applied;

FIG. 2 is a method flow diagram of an embodiment of the present application;

FIG. 3 is a schematic diagram of transfer learning according to an embodiment of the present application;

FIG. 4 is a process schematic of an intermediate database of an embodiment of the present application;

FIG. 5 is a schematic illustration of an apparatus of an embodiment of the present application;

fig. 6 is a schematic diagram of a computer device according to an embodiment of the present application.

Detailed Description

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the applications herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "comprising" and "having" and any variations thereof in the description and claims of the present application and in the description of the figures above are intended to cover non-exclusive inclusions. The terms first, second and the like in the description and in the claims or in the above-described figures, are used for distinguishing between different objects and not necessarily for describing a sequential or chronological order.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the present application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.

In order to better understand the technical solutions of the present application, the following description will clearly and completely describe the technical solutions in the embodiments of the present application with reference to the accompanying drawings.

As shown in fig. 1, a system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the

terminal devices

101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The user may interact with the server 105 via the network 104 using the

terminal devices

101, 102, 103 to receive or send messages or the like. Various communication client applications, such as a web browser application, a shopping class application, a search class application, an instant messaging tool, a mailbox client, social platform software, etc., may be installed on the

terminal devices

101, 102, 103.

The

terminal devices

101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablet computers, electronic book readers, MP3 players (Moving Picture Experts Group Audio Layer III, dynamic video expert compression standard audio plane 3), MP4 (Moving Picture Experts Group Audio Layer IV, dynamic video expert compression standard audio plane 4) players, laptop and desktop computers, and the like.

The server 105 may be a server providing various services, such as a background server providing support for pages displayed on the

terminal devices

101, 102, 103.

It should be noted that, the legal supervision method based on the multi-source data fusion analysis of big data provided by the embodiment of the application is generally executed by a server/terminal device, and correspondingly, the legal supervision system based on the multi-source data fusion analysis of big data is generally arranged in the server/terminal device.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to FIG. 2, there is shown a flow chart of a legal supervision method for big data based multi-source data fusion analysis of the present application, the method comprising the steps of:

the method adopts an adaptive data acquisition model to directionally acquire big data and multisource data, and the construction of the adaptive data acquisition model is performed by adopting a migration learning mode, and specifically please refer to fig. 3:

step 301, training an initial information acquisition model using training data.

Collecting the case files, and converting the case files into case electronic files by using an image recognition technology. The method comprises the steps of automatically segmenting words, labeling parts of speech, identifying named entities, extracting key words of case elements from a structured case electronic file by using a natural language processing technology, combining the key words according to a certain sequence according to the relation between key words and the key words, constructing a knowledge base and a case database which accord with a business knowledge system, and taking the knowledge base and the case database as training data of an initial information acquisition model.

Step 302, in the application stage, the initial information acquisition model is adjusted by using additional task related data to obtain the adaptive data acquisition model to better adapt to the target task/target data.

The method for directionally collecting the source information of the multi-source data case through the self-adaptive data acquisition model further comprises the following steps: and identifying and extracting case source information from two dimensions of the field and the time based on semantic features, and realizing the directional acquisition of the case source information.

the initial data set is processed by the intermediate database, please refer to fig. 4 specifically:

wherein 4a represents processing of the obtained url data, specifically, filtering the external provincial division, monitoring word, eliminating word, part-of-speech, content, domain, url duplication elimination and custom setting of monitoring word/eliminating word based on custom filtering rules for url; 4b represents the process of carrying out case source exchange on different databases, specifically, carrying out case source data exchange among different databases by setting a custom table and a custom library so as to realize case source data exchange across databases; 4c represents the processing of case source exchange by different platforms, specifically, the case source data exchange among different platforms is performed by configuring the service data of different platforms, so as to realize the cross-platform case source data exchange;

the spatially clustering the acquired target dataset based on the characteristics of different classifications includes: based on similarity analysis of the acquired case contents in the target data set, a similarity analysis formula is as follows:

when the house is atThe case content C ₁ And the case content C ₂ When the semanteme is the same, the similarity value is 1, and when the case content C ₁ And the case content C ₂ When the semantics are completely different, their similarity value is close to 0.

And S5, taking the fusion data set as input of a case clue research and judgment model, acquiring a parameter set influencing case clue formation, and establishing a mapping relation between a case output result and a case early warning level.

The inputting of the case cue research exponent model using the fused dataset as the case cue in step S5 includes:

The mapping relationship between the case output result and the case early warning level in the step S5 is established, wherein the mapping relationship is that the higher the case output result is, the higher the case early warning level is.

Those skilled in the art will appreciate that implementing all or part of the above-described embodiment methods may be accomplished by way of a computer program stored in a computer-readable storage medium, which when executed, may comprise the steps of embodiments of the methods described above. The storage medium may be a nonvolatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a random access Memory (Random Access Memory, RAM).

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited in order and may be performed in other orders, unless explicitly stated herein. Moreover, at least some of the steps in the flowcharts of the figures may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, the order of their execution not necessarily being sequential, but may be performed in turn or alternately with other steps or at least a portion of the other steps or stages.

With continued reference to fig. 5, the legal supervision system based on big data multi-source data fusion analysis according to the present embodiment includes: an initial data set acquisition module 501, an initial data set processing module 502, a target data set classification module 503, a classification target data set fusion module 504, a classification target data set processing module 505, and a law supervision module 506;

the initial data set obtaining module 501 is configured to perform directional collection on big data multi-source data by using an adaptive data obtaining model, so as to generate an initial data set;

the initial data set processing module 502 is configured to obtain a target data set after the initial data set is processed by an intermediate database;

the target data set classification module 503 is configured to spatially cluster the obtained target data set based on characteristics of different classifications, and generate a classified target data set;

the classification target data set fusion module 504 is configured to aggregate and analyze information of different elements of the classification target data set by using a sequence pattern mining analysis method, so as to realize information fusion of the classification target data set and obtain a fusion data set;

the classification target data set processing module 505 is configured to take the fusion data set as an input of a case clue research and judgment index model, obtain a parameter set affecting case clue case formation, and establish a mapping relationship between a case output index result and a case early warning level;

the legal supervision module 506 is configured to push the case with the case early warning level exceeding a threshold to a related department, so as to implement legal supervision on the case.

In order to solve the technical problems, the embodiment of the application also provides computer equipment. Referring specifically to fig. 6, fig. 6 is a basic structural block diagram of a computer device according to the present embodiment.

The computer device 6 comprises a memory 6a, a processor 6b, a network interface 6c communicatively connected to each other via a system bus. It should be noted that only a computer device 6 having components 6a-6c is shown in the figures, but it should be understood that not all of the illustrated components need be implemented, and that more or fewer components may alternatively be implemented. It will be appreciated by those skilled in the art that the computer device herein is a device capable of automatically performing numerical calculations and/or information processing in accordance with predetermined or stored instructions, the hardware of which includes, but is not limited to, microprocessors, application specific integrated circuits (Application Specific Integrated Circuit, ASICs), programmable gate arrays (fields-Programmable Gate Array, FPGAs), digital processors (Digital Signal Processor, DSPs), embedded devices, etc.

The computer equipment can be a desktop computer, a notebook computer, a palm computer, a cloud server and other computing equipment. The computer equipment can perform man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch pad or voice control equipment and the like.

The memory 6a includes at least one type of readable storage medium including flash memory, hard disk, multimedia card, card memory (e.g., SD or DX memory, etc.), random Access Memory (RAM), static Random Access Memory (SRAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), programmable Read Only Memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the storage 6a may be an internal storage unit of the computer device 6, such as a hard disk or a memory of the computer device 6. In other embodiments, the memory 6a may also be an external storage device of the computer device 6, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card) or the like, which are provided on the computer device 6. Of course, the memory 6a may also comprise both an internal memory unit of the computer device 6 and an external memory device. In this embodiment, the memory 6a is generally used to store an operating system and various application software installed on the computer device 6, such as program codes of legal supervision methods and systems based on multi-source data fusion analysis of big data. Further, the memory 6a may also be used to temporarily store various types of data that have been output or are to be output.

The processor 6b may be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 6b is typically used to control the overall operation of the computer device 6. In this embodiment, the processor 6b is configured to execute the program code stored in the memory 6a or process data, such as the program code of the legal supervision method and system for multi-source data fusion analysis based on big data.

The network interface 6c may comprise a wireless network interface or a wired network interface, which network interface 6c is typically used to establish a communication connection between the computer device 6 and other electronic devices.

The present application also provides another embodiment, namely, a non-volatile computer readable storage medium storing a program of a law supervision method and system based on big data multi-source data fusion analysis, which can be executed by at least one processor, so that the at least one processor performs the steps of the law supervision method and system based on big data multi-source data fusion analysis as described above.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk), comprising several instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method described in the embodiments of the present application.

It is apparent that the embodiments described above are only some embodiments of the present application, but not all embodiments, the preferred embodiments of the present application are given in the drawings, but not limiting the patent scope of the present application. This application may be embodied in many different forms, but rather, embodiments are provided in order to provide a more thorough understanding of the present disclosure. Although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that modifications may be made to the embodiments described in the foregoing, or equivalents may be substituted for elements thereof. All equivalent structures made by the specification and the drawings of the application are directly or indirectly applied to other related technical fields, and are also within the protection scope of the application.

Claims

1. The legal supervision method based on big data multi-source data fusion analysis is characterized by comprising the following steps:

2. The legal supervision method based on big data multi-source data fusion analysis according to claim 1, wherein the step S1 is characterized in that the big data multi-source data is directionally collected by adopting an adaptive data acquisition model, and the construction of the adaptive data acquisition model is implemented by adopting a migration learning mode, which is specifically expressed as follows:

training an initial data acquisition model using training data;

3. The legal supervision method based on big data multi-source data fusion analysis according to claim 2, wherein the directionally collected data in the step S1 is case source data.

4. The legal supervision method based on big data multi-source data fusion analysis according to claim 3, wherein the adopting the adaptive data acquisition model to perform directional acquisition on big data multi-source data in the step S1 further comprises: and identifying and extracting case source information from two dimensions of the field and the time based on semantic features, and realizing the directional acquisition of the case source information.

5. The legal supervision method based on big data multisource data fusion analysis according to claim 1, wherein the processing of the initial data set in step S2 through an intermediate database comprises:

6. The legal supervision method based on big data multisource data fusion analysis according to claim 1, wherein the spatial clustering of the acquired target data set in the step S3 based on the characteristics of different classifications comprises:

wherein C is ₁ ，C ₂ Respectively represent case contents C ₁ And case content C ₂ ，sim(C ₁ +C ₂ ) Representing the case content C ₁ And the case content C ₂ Similarity between Dis(C ₁ +C ₂ ) Representing the case content C ₁ And the case content C ₂ The distance sigma between the two is an adjustable parameter;

7. The legal supervision method based on big data multi-source data fusion analysis according to claim 1, wherein the step S4 is characterized in that the case clues in the classified case information base are information aggregated by using a sequence pattern mining analysis method to realize the case clue information fusion, and the method comprises the following steps:

8. Legal supervision system based on big data multisource data fusion analysis, characterized by comprising: the system comprises an initial data set acquisition module, an initial data set processing module, a target data set classification module, a classified target data set fusion module, a classified target data set processing module and a law supervision module;

9. An electronic device, comprising:

one or more processors, memory, and one or more computer programs, wherein the one or more computer programs are stored in the memory, the one or more computer programs comprising instructions, which when executed by the device, cause the device to perform the method of any of claims 1-7.

10. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a computer program which, when run on a computer, causes the computer to perform the method according to any of claims 1 to 7.