CN113127060A

CN113127060A - Software function point identification method based on natural language pre-training model (BERT)

Info

Publication number: CN113127060A
Application number: CN202110386325.8A
Authority: CN
Inventors: 仲兆祥; 袁华新; 张笑闻; 郭琼琼; 朱玉
Original assignee: China Communication Service Application And Solution Technology Co ltd
Current assignee: China Communication Service Application And Solution Technology Co ltd
Priority date: 2021-04-09
Filing date: 2021-04-09
Publication date: 2021-07-16

Abstract

The invention discloses a software function point identification method based on a natural language pre-training model (BERT), which comprises the following steps: acquiring at least one requirement description statement; inputting the at least one requirement description statement into the trained named entity recognition model to obtain at least one named entity; performing word segmentation processing on the at least one requirement description sentence to obtain a word segmentation set, wherein the word segmentation set comprises at least one word segmentation; merging the at least one named entity and the participles in the participle set, and performing part-of-speech tagging on a merged result; and processing the part-of-speech tagging result to identify the functional point. The method does not need manual evaluation, and has high functional point identification speed and high accuracy.

Description

Software function point identification method based on natural language pre-training model (BERT)

Technical Field

The invention belongs to the technical field of function point identification, and particularly relates to a function point identification method, a function point identification device, a function point identification application and a computer device.

Background

The function point method is a method for estimating the size of a software item, and measures the scale of the software by quantifying the functions of a system from the viewpoint of a user, and the measurement is mainly based on the logic design of the system. The function point scale measurement method has been widely applied internationally, and has replaced code lines as the most mainstream software scale measurement method. The core idea of the function point method is to decompose the software system according to components so as to determine the number of the function points of the system. The function point method is a size measurement method of decomposition type, namely, a complex system is decomposed into smaller subsystems for evaluation. The software system is decomposed according to components, so that the number of the functional points of the system is determined.

The method adopts a function point method to carry out estimation work, needs to manually read a software requirement document, utilizes professional knowledge mastered by an evaluator to identify the function point from the requirement document, and calculates software research and development cost according to industry productivity benchmark data and the like.

The existing function point method is adopted to realize software function point identification, the requirement on evaluators is high, professional knowledge of the function point method is required to be mastered, and professional knowledge of the software application field and the software development field is required to be mastered, so that the overall evaluation efficiency is low.

Disclosure of Invention

In order to solve the existing problems, the invention provides a function point identification method, a device, an application and a computer device, which identify a named entity in a requirement description sentence through a named entity identification model, combine the named entity and a participle to realize function point identification, do not need to participate in evaluation manually, and have high function point identification speed and accuracy.

The invention is realized by the following technical scheme:

in a first aspect, the present disclosure provides a method for identifying a function point, including:

acquiring at least one requirement description statement;

inputting the at least one requirement description statement into the trained named entity recognition model to obtain at least one named entity;

performing word segmentation processing on the at least one requirement description sentence to obtain a word segmentation set, wherein the word segmentation set comprises at least one word segmentation;

merging the at least one named entity and the participles in the participle set, and performing part-of-speech tagging on a merged result;

and processing the part-of-speech tagging result to identify the functional point.

According to the method, the named entities in the requirement description sentences are identified based on the named entity identification model, the named entities and the participles of the requirement description sentences are combined, and the combining result part-of-speech tagging result is analyzed to identify the function points.

In one possible design, the processing the part-of-speech tagging result to identify the functional point includes:

performing dependency syntax analysis on the part of speech tagging result to obtain the dependency relationship between words;

the functional points are identified according to the dependency relationships.

In one possible design, the named entity recognition model includes a bidirectional pre-training language model Bert and a conditional random field CRF, which are signal-connected in sequence. For example, aiming at software, the field is wide, some fields are limited by the limitation of linguistic data, the recognition effect is unstable, the scheme adopts a target entity recognition model formed on the basis of a two-way pre-training language model Bert and a conditional random field CRF, the conditional random field CRF layer is added on the basis of the two-way pre-training language model Bert of a main layer, the conditional probability between named entity labels in the professional field is learned through the two-way pre-training language model Bert, the transition probability between the named entity labels is learned through the conditional random field CRF, the conditional random field is an undirected graph model which is a typical discriminant model, the characteristics of a maximum entropy model and a hidden Markov model are combined, and the problem of label bias such as maximum entropy, hidden Markov and the like is solved because the conditional random field adopts global normalization; through the combination of the two, the stability of recognition and the cross-domain recognition capability are effectively improved. Compared with the existing model based on the technologies such as maximum entropy, conditional random field, cyclic neural network and the like, the model is novel in structure, and strong in identification stability and cross-domain identification capability.

In one possible design, the method for training the bi-directional pre-training language model Bert includes:

pre-training a bidirectional pre-training language model Bert by utilizing a training set, wherein the training set comprises multi-field corpus data, and each corpus data in the multi-field corpus data comprises at least one requirement description statement;

and adjusting the pre-trained bidirectional pre-training language model Bert by utilizing the corpus data of a professional field, wherein the corpus data of the professional field comprises at least one requirement description sentence of the professional field.

According to the scheme, the bidirectional pre-training language model Bert is pre-trained by adopting multi-field requirement description sentences, and is finely adjusted by utilizing a professional field requirement description sentence, so that the recognition accuracy of the conditional probability of the bidirectional pre-training language model Bert between the named entity labels in the corresponding professional fields is improved.

In one possible design, the functional points include at least one of an internal logic file ILF, an external interface file EIF, an external input EI, an external output EO, and an external query EQ.

In a second aspect, the present invention provides a software development cost estimation method, including the following steps:

identifying the function points in the software requirement document by adopting any one of the function point identification methods in the first aspect;

and calculating software research and development cost based on the function points and the industry benchmark data.

In a third aspect, the invention provides a function point identification device, which comprises a sentence acquisition module, a named entity acquisition module, a merging module, a part-of-speech tagging module and a function point identification module, which are sequentially connected in a communication manner, wherein a word segmentation module is also connected between the sentence acquisition module and the merging module in a communication manner;

the statement acquisition module is used for acquiring at least one requirement description statement;

the named entity obtaining module is used for inputting the at least one requirement description sentence into the trained named entity recognition model to obtain at least one named entity;

the word segmentation module is used for performing word segmentation processing on the at least one requirement description sentence to obtain a word segmentation set, and the word segmentation set comprises at least one word segmentation;

the merging module is used for merging the at least one named entity and at least one participle in the participle set;

the part-of-speech tagging module is used for carrying out part-of-speech tagging on the merged result;

and the functional point identification module is used for processing the part-of-speech tagging result and identifying the functional point.

In a fourth aspect, the present invention provides a software development cost estimation device, which comprises a function point identification device and a cost accounting module, which are sequentially connected in a communication manner,

the function point identification device is the function point identification device provided in the third aspect;

and the cost accounting module is used for calculating the software research and development cost according to the function points and the industry benchmark data.

In a fifth aspect, the present invention provides a computer device, comprising a memory and a processor, wherein the memory is used for storing a computer program, and the processor is used for reading the computer program and executing the function point identification method according to the first aspect or the software development cost estimation method according to the second aspect.

In a sixth aspect, the present invention provides a computer-readable storage medium, having stored thereon instructions, which, when executed on a computer, perform the method for identifying a function point according to the first aspect or the method for estimating a software development cost according to the second aspect.

Compared with the prior art, the invention at least has the following advantages and beneficial effects:

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a flow chart of a functional point identification method of the present invention;

FIG. 2 is an identification diagram of a specific requirement description statement of the present invention;

FIG. 3 is a diagram of a named entity recognition model architecture;

FIG. 4 is a flow chart of a software development cost estimation method of the present invention;

fig. 5 is a schematic diagram of a function point identifying device.

Detailed Description

The invention is further described with reference to the following figures and specific embodiments. It should be noted that the description of the embodiments is provided to help understanding of the present invention, but the present invention is not limited thereto. Specific structural and functional details disclosed herein are merely illustrative of example embodiments of the invention. This invention may, however, be embodied in many alternate forms and should not be construed as limited to the embodiments set forth herein.

It will be understood that, although the terms first, second, etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first element could be termed a second element, and, similarly, a second element could be termed a first element, without departing from the scope of example embodiments of the present invention.

It should be understood that, for the term "and/or" as may appear herein, it is merely an associative relationship that describes an associated object, meaning that three relationships may exist, e.g., a and/or B may mean: a exists alone, B exists alone, and A and B exist at the same time; for the term "/and" as may appear herein, which describes another associative object relationship, it means that two relationships may exist, e.g., a/and B, may mean: a exists independently, and A and B exist independently; in addition, for the character "/" that may appear herein, it generally means that the former and latter associated objects are in an "or" relationship.

It will be understood that when an element is referred to herein as being "connected," "connected," or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may be present. Conversely, if a unit is referred to herein as being "directly connected" or "directly coupled" to another unit, it is intended that no intervening units are present. In addition, other words used to describe the relationship between elements should be interpreted in a similar manner (e.g., "between … …" versus "directly between … …", "adjacent" versus "directly adjacent", etc.).

It is to be understood that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments of the invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises," "comprising," "includes" and/or "including," when used herein, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, numbers, steps, operations, elements, components, and/or groups thereof.

It should also be noted that, in some alternative designs, the functions/acts noted may occur out of the order noted in the figures. For example, two figures shown in succession may, in fact, be executed substantially concurrently, or the figures may sometimes be executed in the reverse order, depending upon the functionality/acts involved.

It should be understood that specific details are provided in the following description to facilitate a thorough understanding of example embodiments. However, it will be understood by those of ordinary skill in the art that the example embodiments may be practiced without these specific details. For example, systems may be shown in block diagrams in order not to obscure the examples in unnecessary detail. In other instances, well-known processes, structures and techniques may be shown without unnecessary detail in order to avoid obscuring example embodiments.

A first aspect of the present embodiment provides a method for identifying a function point, where the method may be performed by an identifying device, where the identifying device may be software, or a combination of software and hardware, and the identifying device may be integrated in a server, a terminal device, or the like. Specifically, as shown in fig. 1, the method for identifying a function point includes the following steps S101 to S105:

and step S101, acquiring at least one requirement description statement. The requirement description statement can be a pure requirement description statement which is marked and refined, or a requirement description statement which is obtained by preprocessing a requirement document. If the requirement document is obtained, preprocessing the requirement document, reading a functional requirement part in the requirement document, and obtaining a requirement description statement. The description will be given by taking an example of a requirement description statement "resource management system provides a function of performing calendar card management on each professional in-network device, and allows an operator to record, modify, delete, query and view corresponding information" in fig. 2.

And S102, inputting the at least one requirement description sentence into the trained named entity recognition model to obtain at least one named entity. Named entities include entities, operations, attributes, functions, and others. The entity is a business entity and a corresponding universal vocabulary, such as a user, an account and the like; the operation is the operation on the entity, such as data operation such as adding, deleting, modifying and inquiring, and professional operation vocabularies such as opening an account and selling an account; attributes, i.e. the name of the attribute of the service entity, such as the user's password, phone, etc.; function is a comprehensive business function, such as daily report; others are some modifier qualifiers, such as consumer in consumer work orders.

The named entity recognition model used in step S102 may be implemented in various ways, and in this embodiment, the named entity recognition model is preferably formed by using a two-way pre-training language model Bert and a conditional random field CRF, and the structure of the named entity recognition model formed by the model is shown in fig. 3.

The named entity recognition model needs to be trained in advance before being used, and is trained in multiple ways, wherein one way is to pre-train a bidirectional pre-trained language model Bert by utilizing a training set, the training set comprises multi-field corpus data, and each corpus data in the multi-field corpus data comprises at least one requirement description statement. The training set can adopt massive Wikipedia data, the corpus of the training set relates to various fields, and the number of the training set is large, so that the bidirectional pre-training language model Bert contains a large amount of priori knowledge, and the improvement of the recognition performance and the cross-domain recognition capability of the model is facilitated. Secondly, pre-training the bidirectional pre-training language model Bert by utilizing a training set, wherein the training set comprises multi-field corpus data, each corpus data in the multi-field corpus data comprises at least one requirement description sentence, and the training set can adopt massive Wikipedia data. And then, adjusting the pre-trained bidirectional pre-training language model Bert by utilizing at least one corpus data of a professional field, wherein each corpus data of the at least one corpus data of the professional field comprises at least one requirement description sentence of the professional field. The specialized field is the same as the field of the requirement description sentence in practical application, that is, the field is the same as the field of the requirement description sentence in step S101, for example, if the functional point recognition is performed on the requirement document in the building aspect, the requirement description sentence in the building field is used as the adjustment training set of the bi-directional pre-training language model Bert.

By taking the requirement description statement in fig. 2 as an example, the requirement description statement is input into the named entity recognition model obtained in the second training mode, and the obtained named entity includes entity, operation, attribute, function and others, and specifically includes a machine calendar card, management, recording, modification, deletion, query and viewing.

Step S103, performing word segmentation processing on the at least one requirement description sentence to obtain a word segmentation set, wherein the word segmentation set comprises at least one word segmentation.

In the step, if the requirement description statement is Chinese, Chinese word segmentation processing is carried out on the requirement description statement; and if the requirement description sentence is English, carrying out English word segmentation on the requirement description sentence.

And step S104, merging the at least one named entity and the participles in the participle set, and performing part-of-speech tagging on the merged result. Part-of-speech tagging may be based on Hidden Markov Models (HMMs), Conditional Random Fields (CRFs), Maximum Entropy Markov Models (MEMMs), Recurrent Neural Networks (RNNs), Support Vector Machines (SVMs).

And step S105, processing the part-of-speech tagging result to identify the functional point. Specifically, dependency syntax analysis is performed on the part-of-speech tagging results to obtain the dependency relationship between words; and identifying the functional points according to the dependency relationship.

The function points identified in step S105 include at least one of an internal logic file ILF, an external interface file EIF, an external input EI, an external output EO, and an external query EQ. Taking the requirement description statement in step fig. 2 as an example, the identified function points include an internal logic file ILF, an external input EI, an external output EO, and an external query EQ, wherein the detailed identification result is shown in fig. 2.

According to the function point identification method detailed in the steps S101 to S105, the named entity in the requirement description sentence is identified through the named entity identification model, the named entity and the participle of the requirement description sentence are combined, and the function point is identified through analyzing the part-of-speech tagging result of the combined result.

A second aspect of the present embodiment provides a software development cost estimation method, which may also be executed by an estimation device, where the estimation device may be software, or a combination of software and hardware, and the estimation device may be integrally disposed in a server, a terminal device, or the like. Specifically, as shown in fig. 4, the software development cost estimation method includes the following steps S201 to S202.

Step S201, identifying a function point in the software requirement document by using any one of the function point identification methods in steps S101 to S105 in the first aspect.

And S202, calculating software development cost based on the function points and the industry benchmark data.

A third aspect of this embodiment provides a function point identification device, as shown in fig. 5, where the function point identification device includes a sentence acquisition module, a named entity acquisition module, a merging module, a part-of-speech tagging module, and a function point identification module, which are sequentially connected in a communication manner, and a word segmentation module is further connected between the sentence acquisition module and the merging module in a communication manner;

In a possible design, the recognition apparatus further includes a storage module, and the storage module is configured to store information such as the requirement description statement, the training set of the named entity recognition model, and the like.

In one possible design, the function point identification module comprises a dependency analysis module and an identification module which are sequentially connected in a communication manner, wherein the dependency analysis module performs dependency syntax analysis on the part-of-speech tagging result to obtain the dependency relationship between words; the identification module identifies the functional points according to the dependency relationships.

In one possible design, the named entity acquisition module includes a bidirectional pre-trained language model Bert and a conditional random field CRF, which are signal-connected in sequence.

A third aspect of the present embodiment provides a software development cost estimation device, which includes a function point identification device and a cost accounting module, which are sequentially connected in communication,

the function point identification device is a function point identification device designed in the third aspect and any one of the possible ways;

A fourth aspect of the present embodiment provides a computer device, including a memory and a processor, where the memory is used for storing a computer program, and the processor is used for reading the computer program and executing the method for identifying a function point according to the first aspect or the method for estimating a software development cost according to the second aspect. For example, the Memory may include, but is not limited to, a Random-Access Memory (RAM), a Read-Only Memory (ROM), a Flash Memory (Flash Memory), a First-in First-out (FIFO), and/or a First-in Last-out (FILO), and the like; the processor may not be limited to the use of a microprocessor of the model number STM32F105 family. In addition, the computer device may also include, but is not limited to, a power module, a display screen, and other necessary components.

A fifth aspect of the present embodiment provides a computer-readable storage medium, which stores instructions that, when executed on a computer, perform the function point identification method according to the first aspect or the software development cost estimation method according to the second aspect.

The embodiments described above are merely illustrative, and may or may not be physically separate, if referring to units illustrated as separate components; if reference is made to a component displayed as a unit, it may or may not be a physical unit, and may be located in one place or distributed over a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: modifications may be made to the embodiments described above, or equivalents may be substituted for some of the features described. And such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Finally, it should be noted that the present invention is not limited to the above alternative embodiments, and that various other forms of products can be obtained by anyone in light of the present invention. The above detailed description should not be taken as limiting the scope of the invention, which is defined in the claims, and which the description is intended to be interpreted accordingly.

Claims

1. A method for identifying a function point is characterized by comprising the following steps:

acquiring at least one requirement description statement;

2. The method according to claim 1, wherein the processing the part-of-speech tagging result to identify the functional point comprises:

the functional points are identified according to the dependency relationships.

3. The method according to claim 1, wherein the named entity recognition model comprises a bidirectional pre-training language model Bert and a conditional random field CRF, which are sequentially signal-connected.

4. The method according to claim 3, wherein the training method of the bi-directional pre-trained language model Bert comprises:

and adjusting the pre-trained bidirectional pre-training language model Bert by utilizing at least one corpus data of a professional field, wherein each corpus data of the at least one corpus data of the professional field comprises at least one requirement description sentence of the professional field.

5. The method of claim 1, wherein the function point comprises at least one of an internal logic file ILF, an external interface file EIF, an external input EI, an external output EO, and an external query EQ.

6. A software development cost prediction method is characterized by comprising the following steps:

identifying the functional points in the software requirement document by adopting the functional point identification method of any one of the claims 1-5;

7. A function point identification device is characterized by comprising a sentence acquisition module, a named entity acquisition module, a merging module, a part-of-speech tagging module and a function point identification module which are sequentially connected in a communication manner, wherein a word segmentation module is further connected between the sentence acquisition module and the merging module in a communication manner;

the merging module is used for merging the at least one named entity and the participles in the participle set;

8. A software development cost pre-estimation device is characterized by comprising a function point identification device and a cost accounting module which are sequentially in communication connection,

the function point identifying apparatus is the function point identifying apparatus of claim 7;

9. A computer device comprising a memory and a processor communicatively connected, wherein the memory is configured to store a computer program, and the processor is configured to read the computer program, perform the function point identification method according to any one of claims 1 to 5, or perform the software development cost estimation method according to claim 6.

10. A computer-readable storage medium having stored thereon instructions for performing the method of any one of claims 1-5 or the method of estimating development cost of software according to claim 6 when the instructions are run on a computer.