CN114363290A - Domain name identification method, device, equipment and storage medium - Google Patents

Domain name identification method, device, equipment and storage medium Download PDF

Info

Publication number
CN114363290A
CN114363290A CN202111672272.2A CN202111672272A CN114363290A CN 114363290 A CN114363290 A CN 114363290A CN 202111672272 A CN202111672272 A CN 202111672272A CN 114363290 A CN114363290 A CN 114363290A
Authority
CN
China
Prior art keywords
domain name
data
sub
feature
level
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111672272.2A
Other languages
Chinese (zh)
Other versions
CN114363290B (en
Inventor
赖秋楠
梁彧
傅强
蔡琳
杨满智
田野
王杰
阿曼太
金红
陈晓光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Eversec Beijing Technology Co Ltd
Original Assignee
Eversec Beijing Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Eversec Beijing Technology Co Ltd filed Critical Eversec Beijing Technology Co Ltd
Priority to CN202111672272.2A priority Critical patent/CN114363290B/en
Publication of CN114363290A publication Critical patent/CN114363290A/en
Application granted granted Critical
Publication of CN114363290B publication Critical patent/CN114363290B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The embodiment of the invention discloses a domain name identification method, a domain name identification device, domain name identification equipment and a storage medium. The method comprises the following steps: determining sub-domain name data in the domain name data to be identified; acquiring at least one sub-domain name feature of the sub-domain name data, and matching each sub-domain name feature with a target feature condition to obtain a sub-domain name feature matching result; wherein the target characteristic condition is determined according to the sub-domain name level threshold and the characteristic structure data; and under the condition that the matching result of the characteristics of each sub-domain name is determined to be successful, determining the domain name data to be identified as the target domain name data. The embodiment of the invention can realize the fast and batch domain name identification without occupying network communication bandwidth, save network resources and reduce the domain name identification cost.

Description

Domain name identification method, device, equipment and storage medium
Technical Field
The embodiment of the invention relates to the technical field of computers, in particular to a domain name identification method, a domain name identification device, domain name identification equipment and a storage medium.
Background
The domain name type identification has important significance for technologies such as network operation and maintenance, network security detection and the like. . For example, the CDN (Content Delivery Network) technology may return a plurality of IP addresses (Internet Protocol addresses) for providing services in a DNS (Domain Name System) record, and even if a single server fails, other servers can still provide services, thereby greatly improving the availability of the services. In addition, the purpose of load balancing can be achieved through rotation of a plurality of service IP addresses. When a CDN technology application exists in a network, a CDN domain name can be identified and the access speed of the CDN domain name can be tested; or when network security detection is carried out, the CND domain name has a similar structure with an illegal domain name generated by Fast-flux technology and is easily misjudged as an abnormal domain name, and the CDN domain name can be further identified in the abnormal domain name so as to accurately detect the illegal domain name.
In the prior art, the domain name type is generally required to be identified by accessing the domain name and the IP address corresponding to the domain name. However, such a method cannot quickly identify the domain name type in batch, and needs to occupy network communication bandwidth, consumes network resources, and has extremely high domain name identification cost.
Disclosure of Invention
Embodiments of the present invention provide a domain name recognition method, apparatus, device, and storage medium, so as to realize fast and batch domain name recognition without occupying network communication bandwidth, save network resources, and reduce domain name recognition cost.
In a first aspect, an embodiment of the present invention provides a domain name identification method, including:
determining sub-domain name data in the domain name data to be identified;
acquiring at least one sub-domain name feature of the sub-domain name data, and matching each sub-domain name feature with a target feature condition to obtain a sub-domain name feature matching result; wherein the target feature condition is determined according to a sub-domain name level threshold and feature structure data;
and under the condition that the matching result of the sub-domain name features is determined to be successful, determining the domain name data to be identified as target domain name data.
In a second aspect, an embodiment of the present invention further provides a domain name recognition apparatus, including:
the sub-domain name determining module is used for determining sub-domain name data in the domain name data to be identified;
the characteristic matching module is used for acquiring at least one sub-domain name characteristic of the sub-domain name data and matching each sub-domain name characteristic with a target characteristic condition to obtain a sub-domain name characteristic matching result; wherein the target feature condition is determined according to a sub-domain name level threshold and feature structure data;
and the first determining module is used for determining the domain name data to be identified as the target domain name data under the condition that the matching result of the characteristics of the sub domain names is determined to be successful.
In a third aspect, an embodiment of the present invention further provides a computer device, where the computer device includes:
one or more processors;
storage means for storing one or more programs;
when the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the domain name recognition method provided by any embodiment of the present invention.
In a fourth aspect, an embodiment of the present invention further provides a computer storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the domain name identification method provided in any embodiment of the present invention.
According to the embodiment of the invention, the sub-domain name data of the domain name data to be identified is determined, at least one sub-domain name characteristic of the sub-domain name data is obtained, and each sub-domain name characteristic is matched with the target characteristic condition according to the target characteristic condition determined by the sub-domain name level threshold and the characteristic structure data, so that the sub-domain name characteristic matching result is obtained, and therefore, the domain name data to be identified is determined to be the target domain name data under the condition that the matching of each sub-domain name characteristic matching result is successful, the domain name is identified based on the domain name characteristics, the problems of low domain name identification efficiency and more occupied network resources caused by the dependence on accessing of the domain name and the IP address thereof in the prior art are avoided, the domain names are identified rapidly in batches, the network communication bandwidth is not required to be occupied, the network resources are saved, and the domain name identification cost is reduced.
Drawings
Fig. 1 is a flowchart of a domain name recognition method according to an embodiment of the present invention.
Fig. 2 is a flowchart of a domain name recognition method according to a second embodiment of the present invention.
Fig. 3 is a schematic flowchart of CDN domain name identification according to a second embodiment of the present invention.
Fig. 4 is a schematic structural diagram of a domain name recognition apparatus according to a third embodiment of the present invention.
Fig. 5 is a schematic structural diagram of a computer device according to a fourth embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention.
It should be further noted that, for the convenience of description, only some but not all of the relevant aspects of the present invention are shown in the drawings. Before discussing exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart may describe the operations (or steps) as a sequential process, many of the operations can be performed in parallel, concurrently or simultaneously. In addition, the order of the operations may be re-arranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figure. The processes may correspond to methods, functions, procedures, subroutines, and the like.
Example one
Fig. 1 is a flowchart of a domain name recognition method according to an embodiment of the present invention, where this embodiment is applicable to a case where whether any domain name is a domain name of a specific type, and this method may be executed by a domain name recognition apparatus according to an embodiment of the present invention, where the apparatus may be implemented by software and/or hardware, and may be generally integrated in a computer device. Accordingly, as shown in fig. 1, the method comprises the following operations:
and S110, determining sub domain name data in the domain name data to be identified.
The domain name data to be identified may be domain name content data that needs to be identified as whether the domain name is a domain name of a specific type. The sub domain name data may be domain name content data of a sub domain name in the domain name data to be recognized.
Correspondingly, the domain name data to be recognized may generally include content data of each domain name structure, and the domain name structure may generally include a root domain name and a main domain name, and may further extend sub-domain names based on the main domain name. Thus, the sub-domain data part thereof can be determined in the domain data to be identified.
S120, at least one sub-domain name feature of the sub-domain name data is obtained, and each sub-domain name feature is matched with a target feature condition to obtain a sub-domain name feature matching result.
Wherein the target feature condition is determined according to a sub-domain name level threshold and feature structure data.
In particular, the sub-domain name features may be any dimension of the sub-domain name, and may include, but are not limited to, features that can be matched with the target feature conditions. The target feature condition may be a particular condition that needs to be satisfied by the features of the sub-domain names of the particular type of domain name identified. The sub-domain name level threshold may be an extreme value of the number of sub-domain name levels included in the specific type of domain name that needs to be identified. The feature structure data may be content data of a specific structure included in a sub-domain name in a specific type of domain name that needs to be identified.
Correspondingly, the target characteristic condition can be predetermined according to the specific type of domain name to be identified and the characteristics of the sub-domain name in the type of domain name. The characteristics of the sub-domain name in the specific type domain name may include the characteristics of the sub-domain name level number dimension, and then the sub-domain name level threshold may be determined according to the extreme value of the sub-domain name level number included in the specific type domain name, so that the target characteristic condition is determined according to the sub-domain name level threshold, and is used to match the domain name data to be identified whose sub-domain name level number is within the range defined by the sub-domain name level threshold. The characteristics of the sub-domain name in the specific type domain name may further include characteristics of content data included in the sub-domain name, and then, the content of the specific structure included in the sub-domain name in the specific type domain name may be used as feature structure data, so that a target feature condition is determined according to the feature structure data, and the target feature condition is used for matching out domain name data to be identified including the feature structure data in the content of the sub-domain name.
Further, at least one sub-domain name feature can be acquired from the acquired sub-domain name data of the domain name data to be identified, and each sub-domain name feature is matched with a predetermined target feature condition to obtain a sub-domain name feature matching result, and the sub-domain name feature matching result can describe whether the sub-domain name feature of the sub-domain name data in the domain name data to be identified meets the target feature condition, so that whether the domain name data to be identified is the content data of the specific type of domain name to be identified can be determined according to the sub-domain name feature matching result.
S130, under the condition that the matching result of the sub-domain name features is determined to be successful, determining the domain name data to be identified as target domain name data.
Wherein, successful matching can indicate that the sub-domain name features satisfy the description of the target feature condition. The target domain name data may be content data for a specific type of domain name that needs to be detected.
Correspondingly, the matching results of all the sub-domain names of the domain name data to be recognized are successful, and it can be shown that all the sub-domain name features of the sub-domain name data meet the description of the corresponding target feature conditions, so that it can be determined that the domain name data to be recognized has the specific features of the target domain name data, and it can be determined that the domain name data to be recognized is the target domain name data.
The embodiment of the invention provides a domain name identification method, which comprises the steps of determining sub-domain name data in domain name data to be identified, obtaining at least one sub-domain name characteristic of the sub-domain name data, matching each sub-domain name characteristic with a target characteristic condition according to the target characteristic condition determined by a sub-domain name level threshold and characteristic structure data, and obtaining a sub-domain name characteristic matching result, so that the domain name data to be identified is determined to be the target domain name data under the condition that the matching of each sub-domain name characteristic is determined to be successful, the domain name is identified based on the domain name characteristics, the problems of low domain name identification efficiency and more occupied network resources caused by the fact that the domain name and an IP address thereof are accessed in the prior art are solved, the domain names are identified rapidly in batches, the network communication bandwidth is not required to be occupied, the network resources are saved, and the domain name identification cost is reduced.
Example two
Fig. 2 is a flowchart of a domain name recognition method according to a second embodiment of the present invention. The embodiment of the present invention is embodied on the basis of the above-described embodiment, and in the embodiment of the present invention, a specific optional implementation manner is provided for obtaining at least one sub-domain name feature of the sub-domain name data, and matching each sub-domain name feature with a target feature condition to obtain a sub-domain name feature matching result.
As shown in fig. 2, the method of the embodiment of the present invention specifically includes:
s210, determining sub domain name data in the domain name data to be identified.
In an optional embodiment of the present invention, the determining the sub-domain name data in the domain name data to be identified may include: identifying tail end root domain name data in the domain name data to be identified, and determining root domain name separation data of the tail end root domain name data; determining main domain name data in the domain name data to be identified and main domain name separation data of the main domain name data according to the tail end root domain name data and the root domain name separation data; and determining the sub-domain name data in the domain name data to be identified according to the main domain name data and the main domain name separation data.
The terminal root domain name data may be root domain name content data located at the very end of the domain name data to be identified. The master domain name data may be content data in a structure that separates only root domain name separation data from the end root domain name data. The root domain name separation data is used for separating the tail end root domain name data from the main domain name data. The main domain name separation data is used for separating the main domain name data from the sub domain name data. Alternatively, the root domain name separation data and the main domain name separation data may be any same or different characters that can be distinguished from the domain name content data, and may be, for example, the ". multidot" symbol in the domain name.
Correspondingly, the root domain name in the structure at the extreme end of the domain name data to be identified can be determined as the root domain name of the domain name to be identified, so that the terminal root domain name data can be identified according to the root domain name. The separation data at the adjacent position of the tail end root domain name can be determined as root domain name separation data, and is used for representing the root domain name structure at the tail end of the domain name data to be identified and distinguishing the tail end root domain name data from the main domain name data. Therefore, the main domain data separated from the end root name data by the root name separation data can be determined based on the end root domain data and the root name separation data.
Further, the root domain name separation data and the main domain name separation data may collectively represent a main domain name structure of the domain name data to be identified, and the main domain name data may be respectively adjacent to the root domain name separation data and the main domain name separation data, so that the main domain name separation data may be determined according to the determined main domain name data and the root domain name separation data. Therefore, the sub-domain data separated from the main domain data by the main domain separation data can be further determined according to the main domain data and the main domain separation data.
S220, obtaining at least one sub-domain name feature of the sub-domain name data, and matching each sub-domain name feature with a target feature condition to obtain a sub-domain name feature matching result.
Wherein the target feature condition is determined according to a sub-domain name level threshold and feature structure data.
In an optional embodiment of the present invention, S220 may specifically include:
s221, obtaining the level quantity characteristics of the sub-domain name data.
Wherein the level number feature may be a feature describing the level number of the sub domain name data.
Correspondingly, the domain name data to be identified may include one or more levels of sub-domain names, and the target characteristic condition may be determined according to the sub-domain name level threshold, so that the level quantity characteristic of the sub-domain name data may be obtained according to the level quantity of the sub-domain names of the domain name data to be identified, and the sub-domain name characteristic matching result may be obtained according to the matching degree between the level quantity characteristic and the description of the target characteristic condition.
In an optional embodiment of the present invention, the obtaining of the level quantity characteristic of the sub domain name data may include: identifying level separation data in the sub-domain name data; determining the level number characteristic as the level number being greater than or equal to the sub-domain name level threshold, in the event that it is determined that the level separation data in the sub-domain name data is greater than or equal to a separation number threshold.
The level separation data may be any character which can be distinguished from the content data of two adjacent levels of sub-domain names, and may be used to separate the sub-domain name data of two adjacent levels. The separation number threshold may be predetermined according to the sub-domain name level threshold, and may be a number of level separation data in the sub-domain name data when the sub-domain name level is the sub-domain name level threshold.
Correspondingly, each level of sub-domain name content data in the sub-domain name data may be separated by level separation data, and the number of the level separation data may be positively correlated with the number of the sub-domain name levels, for example, the number of the sub-domain name levels may be the number of the level separation data plus one, and then the level separation data may be identified in the sub-domain name data, and then when it is determined that the level separation data is greater than or equal to the separation number threshold, it may be determined that the number of the levels of the sub-domain name data divided by the number of the level separation data is greater than or equal to the sub-level domain name threshold.
S222, under the condition that the level quantity of the sub domain name data is determined to be larger than or equal to the sub domain name level threshold value according to the level quantity characteristic, at least one level content data of the sub domain name data is obtained.
Wherein the level content data may be content data of a sub domain name of an arbitrary level.
Correspondingly, the characteristics of the sub-domain in the target domain data may include that the number of the sub-domain levels is multiple, and in order to determine whether the domain to be identified is the target domain data, the minimum value of the number of the sub-domain levels of the target domain data may be determined to be the domain level threshold, and the target characteristic condition may include that the number of the levels of the sub-domain data is greater than or equal to the sub-domain level threshold, and then the part of the domain data to be identified whose number of the sub-domain levels is greater than or equal to the sub-domain level threshold may be screened out according to the level number characteristics of the domain data to be identified.
Optionally, the target domain name data may be a CDN domain name. Specifically, if there is a domain name with a sub-domain name level more than one level in the access domain name provided by the CDN, the sub-domain name level threshold may be determined as a minimum value 2 of the level number that may occur, and domain name data to be identified with a sub-domain name level number more than one level may be screened according to the sub-domain name level threshold, so that the CDN domain name may be further identified in the part of the domain name data to be identified.
For example, in the screening of the abnormal domain name, the CDN domain name has multiple levels of sub-domain names and is easily identified as the abnormal domain name, the domain name to be identified may be the abnormal domain name, and the target domain name data may include the CDN domain name that needs to be identified in the abnormal domain name, so as to further lock the illegal domain name in other domain names except the CDN domain name in the abnormal domain name. Therefore, in order to distinguish the CDN domain name in the abnormal domain name, the minimum value of the sub-domain name level number of the CDN domain name may be used as a domain name level threshold, and it is determined that the target characteristic condition may include that the level number of the sub-domain name data is greater than or equal to the sub-domain name level threshold, and then the domain name data to be identified, which is suspected to be the CDN domain name and has an excessive sub-domain name level number, may be screened according to the level number characteristic of the domain name data to be identified, so as to further screen the part of the domain name data to be identified in the subsequent step.
In an optional embodiment of the present invention, the obtaining at least one level content data of the sub domain name data may include: acquiring the level content data of the terminal level of the sub domain name data; repeatedly executing the step of obtaining the level content data of the previous level under the condition that the feature structure data is not included in the level content data until the feature structure data is included in the level content data or all the level content data is determined not to include the feature structure data.
Wherein the end level may be the level at the end of the levels of the sub-domain name.
Correspondingly, when it is determined that the number of levels of the domain name data to be identified is greater than or equal to the sub-domain name level threshold, it can be stated that the sub-domain name data in the domain name data to be identified includes content data of multiple levels, and then the level content data of the end level of the sub-domain name data can be acquired, and it is determined whether feature structure data is included therein. Specifically, any realizable method may be used to determine whether the level content data includes the feature structure data, for example, a fuzzy matching method may be used, and is not limited herein.
Further, in the case where it is determined that the feature structure data is included in the hierarchical content data of the end level, it is possible to not continue to acquire the hierarchical content data of its previous position. In the case where it is determined that the feature structure data is not included in the hierarchical content data of the end level, the hierarchical content data of the previous level may be continuously acquired, and in the case where it is determined that the feature structure data is not included therein, the hierarchical content data of the previous level may be continuously acquired until it is determined that the feature structure data is included in the acquired hierarchical content data, the hierarchical content data of the previous level may not be continuously acquired; or until all levels of content data in the sub-domain name data are obtained and all levels of content data are determined not to include feature structure data.
S223, under the condition that the feature structure data is determined to be included in any level content data, determining that the sub-domain name feature matching result is successful in matching.
Correspondingly, the characteristics of the sub-domain name in the target domain name data may further include content data including a specific structure in the sub-domain name data. Therefore, in order to further identify the domain name data to be identified, the content data including the specific structure in the sub-domain name data of the target domain name data may be determined as feature structure data, and the target feature condition may include the feature structure data in the sub-domain name data, so that the domain name data to be identified including the feature structure data in the sub-domain name data may be further screened out from the domain name data to be identified, the level number of the sub-domain name of the domain name data to be identified being greater than or equal to the sub-domain name level threshold, and the matching of the level number features of the sub-domain name of the domain name data to be identified and the features of the level content data is realized through the target feature condition, and the sub-domain name feature matching result corresponding to the domain name data to be identified is obtained.
Optionally, the feature structure data included in the sub-domain name data of the CDN domain name may be root domain name structure content data. Specifically, the root domain name structure content data may be content data that is usually included in a root domain name in which a domain name is present that is a combination of two conventional domain names by separating data connections.
Illustratively, CDN domain name data "apkselfdl.vivo.com.cn.cn.wsglb 0.com" is composed of domain names "apkselfdl.vivo.com.cn" and "wsglb 0.com", then according to the separation data ". the" symbol "can identify that the root domain name data located at the very end is" com ", and the main domain name data is" wsglb0 ", which has four levels of sub-domain names, and the end of the sub-domain name data is root domain name structure content data" com.cn ", then in the case that the root domain name structure content data is identified in any sub-domain name level from the end to the front end of the domain name data to be identified including the multi-level sub-domain names, it can be determined that the sub-domain name level is the root domain name of the conventional domain name located at the front end of the two conventional domain names constituting the CDN, and the obtained sub-domain name feature matching result is a matching result, i.e., it can be used to determine that the domain name data to be identified is the domain name. And when the CDN domain name needs to be identified in the abnormal domain name, the sub-domain name of the illegal domain name with the multi-stage sub-domain name does not include feature structure data, so that the CDN domain name and the illegal domain name can be distinguished in the abnormal domain name.
In an optional embodiment of the present invention, after the obtaining at least one sub-domain name feature of the sub-domain name data, and matching each sub-domain name feature with a target feature condition to obtain a sub-domain name feature matching result, the method may further include: under the condition that the level number is smaller than the sub-domain name level threshold value, determining that the sub-domain name feature matching result is matching failure, and acquiring main domain name data in the domain name data to be identified; and under the condition that the main domain name data is determined to have the target main domain name characteristics, determining the domain name data to be identified as the target domain name data.
The target main domain name feature may be a specific feature possessed by main domain name data of a specific type of target domain name data to be identified.
Accordingly, the characteristics of the target domain name data may also include that its primary domain name data generally has a particular target primary domain name characteristic, distinct from the primary domain name data of any other type of domain name. Therefore, if it is determined that the level number of the domain name data to be identified is smaller than the sub-domain name level threshold, it can be said that the sub-domain name of the domain name data to be identified does not have the characteristics of the sub-domain name of the target domain name data, the main domain name data of the domain name data can be acquired, and the main domain name data is identified as the target domain name data under the condition that the main domain name data is determined to have the characteristics of the target main domain name.
In an optional embodiment of the present invention, the determining that the primary domain name data has the target primary domain name characteristic may include: acquiring a preset number of target position content characters in the main domain name data according to the characteristic content data; determining that the main domain name data has the target main domain name characteristic if it is determined that the target location content character belongs to the characteristic content data.
Wherein the characteristic content data may be specific content included in a specific position of the main domain name data of a specific type of target domain name data to be identified. The preset number may be a number determined according to the number of characters of the characteristic content data. The target position content characters may be a preset number of characters acquired at a specific position where the characteristic content data may appear in the main domain name data of the data to be recognized.
Accordingly, the target primary domain name feature of the target domain name data may be the inclusion of feature content data in a particular location of its primary domain name data. Therefore, the preset number may be determined according to the feature content data, and optionally, the preset number may be a maximum value that may occur in the number of characters of the feature content data, and the position where the feature content data is located is determined, so that the preset number of target position content characters may be obtained in the main domain name data of the domain name data to be identified. Further, if the target position content character is any feature content data, it can be shown that the domain name data to be identified has the specific feature of the target domain name data, and it can be determined that the domain name data to be identified is the target domain name data.
Optionally, the CDN domain name may include a domain name whose main domain name data is formed by using conventional main domain name data in combination with feature content data "CDN" or "dns" as a terminal character, and the preset number of target location content characters may include the three last characters of the main domain name data.
Illustratively, the main domain name data of CDN domain name data "idv1pcm. Then, the three numbers of the characters at the end of the main domain name data in the domain name data to be identified may be obtained as target location content characters, and the domain name data to be identified is determined as CDN domain name data when the target location content characters are determined to be "CDN" or "dns". When the CDN domain name needs to be identified in the abnormal domain name, the illegal domain name with the main domain name data being the random character string does not have the target main domain name characteristics, so that the CDN domain name and the illegal domain name can be distinguished in the abnormal domain name.
And S230, determining the domain name data to be identified as target domain name data under the condition that the matching result of the characteristics of each sub-domain name is determined to be successful.
Exemplarily, fig. 3 is a schematic flowchart of CDN domain name identification provided in the second embodiment of the present invention. In a specific example, as shown in fig. 3, whether any domain name data to be identified is a CDN domain name is identified, whether each domain name to be identified has at most one level of sub-domain names may be determined in an original data set of the acquired domain names to be identified, if so, a main domain name of the domain name may be extracted, and whether the main domain name ends with a character of "CDN" or "dns", and if the determination result continues to be yes, the domain name may be determined to be the CDN domain name. If any domain name to be identified does not have at most one level of sub-domain names, whether the domain name to be identified is formed by combining two conventional domain names or not can be further judged by judging whether the sub-domain names of the domain name to be identified comprise root domain name content, and if so, the domain name to be identified can also be determined to be a CDN domain name.
The embodiment of the invention provides a domain name identification method, which comprises the steps of determining sub-domain name data in domain name data to be identified, obtaining at least one sub-domain name characteristic of the sub-domain name data, matching each sub-domain name characteristic with a target characteristic condition according to the target characteristic condition determined by a sub-domain name level threshold and characteristic structure data, and obtaining a sub-domain name characteristic matching result, so that the domain name data to be identified is determined to be the target domain name data under the condition that the matching of each sub-domain name characteristic is determined to be successful, the domain name is identified based on the domain name characteristics, the problems of low domain name identification efficiency and more occupied network resources caused by the fact that the domain name and an IP address thereof are accessed in the prior art are solved, the domain names are identified rapidly in batches, the network communication bandwidth is not required to be occupied, the network resources are saved, and the domain name identification cost is reduced.
EXAMPLE III
Fig. 4 is a schematic structural diagram of a domain name recognition apparatus according to a third embodiment of the present invention, as shown in fig. 4, the apparatus includes: a sub-domain name determining module 310, a feature matching module 320, and a first determining module 330.
The sub-domain name determining module 310 is configured to determine sub-domain name data in the domain name data to be identified.
The feature matching module 320 is configured to obtain at least one sub-domain name feature of the sub-domain name data, and match each sub-domain name feature with a target feature condition to obtain a sub-domain name feature matching result; wherein the target feature condition is determined according to a sub-domain name level threshold and feature structure data.
The first determining module 330 is configured to determine that the domain name data to be identified is target domain name data when it is determined that the matching result of the sub-domain name features is a successful match.
In an optional implementation manner of the embodiment of the present invention, the feature matching module 320 may include: the level quantity characteristic acquisition submodule is used for acquiring the level quantity characteristics of the sub-domain name data; a level content data obtaining sub-module, configured to obtain at least one level content data of the sub-domain name data when it is determined that the level number of the sub-domain name data is greater than or equal to the sub-domain name level threshold according to the level number characteristic; and the sub-domain name feature matching sub-module is used for determining that the sub-domain name feature matching result is successful in matching under the condition that the feature structure data is included in any level of content data.
In an optional implementation manner of the embodiment of the present invention, the level number feature obtaining sub-module may be specifically configured to: identifying level separation data in the sub-domain name data; in an instance in which it is determined that the level separation data in the sub-domain name data is greater than or equal to a separation number threshold, determining the level number characteristic as the level number being greater than or equal to the sub-domain name level threshold; wherein the number of partitions threshold is predetermined according to the sub-domain name level threshold.
In an optional implementation manner of the embodiment of the present invention, the level content data obtaining sub-module may be specifically configured to: acquiring the level content data of the terminal level of the sub domain name data; repeatedly executing the step of obtaining the level content data of the previous level under the condition that the feature structure data is not included in the level content data until the feature structure data is included in the level content data or all the level content data is determined not to include the feature structure data.
In an optional implementation manner of the embodiment of the present invention, the apparatus may further include: the main domain name determining module is used for determining that the sub-domain name feature matching result is a matching failure and acquiring main domain name data in the domain name data to be identified under the condition that the level number is smaller than the sub-domain name level threshold value; and the second determining module is used for determining the domain name data to be identified as the target domain name data under the condition that the main domain name data is determined to have the target main domain name characteristics.
In an optional implementation manner of the embodiment of the present invention, the second determining module may be specifically configured to: acquiring a preset number of target position content characters in the main domain name data according to the characteristic content data; determining that the main domain name data has the target main domain name characteristic if it is determined that the target location content character belongs to the characteristic content data.
In an optional implementation manner of the embodiment of the present invention, the sub-domain name determining module 310 may be specifically configured to: identifying tail end root domain name data in the domain name data to be identified, and determining root domain name separation data of the tail end root domain name data; determining main domain name data in the domain name data to be identified and main domain name separation data of the main domain name data according to the tail end root domain name data and the root domain name separation data; wherein the root domain name separation data is used for separating the tail end root domain name data from the main domain name data; determining the sub-domain name data in the domain name data to be identified according to the main domain name data and the main domain name separation data; the main domain name separation data is used for separating the main domain name data from the sub domain name data.
The device can execute the domain name identification method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects for executing the method.
The embodiment of the invention provides a domain name recognition device, which is characterized in that sub-domain name data of domain name data to be recognized is determined in the domain name data to be recognized, at least one sub-domain name feature of the sub-domain name data is obtained, and each sub-domain name feature is matched with a target feature condition according to the target feature condition determined by a sub-domain name level threshold and feature structure data, so that a sub-domain name feature matching result is obtained, and therefore, the domain name data to be recognized is determined to be the target domain name data under the condition that the matching of each sub-domain name feature matching result is successful, the domain name is recognized based on the domain name features, the problems of low domain name recognition efficiency and more occupied network resources caused by depending on accessing the domain name and an IP address in the prior art are solved, the domain names are recognized in batches quickly, the network communication bandwidth is not occupied, the network resources are saved, and the domain name recognition cost is reduced.
Example four
Fig. 5 is a schematic structural diagram of a computer device according to a fourth embodiment of the present invention. FIG. 5 illustrates a block diagram of an exemplary computer device 12 suitable for use in implementing embodiments of the present invention. The computer device 12 shown in FIG. 5 is only an example and should not bring any limitations to the functionality or scope of use of embodiments of the present invention.
As shown in FIG. 5, computer device 12 is in the form of a general purpose computing device. The components of computer device 12 may include, but are not limited to: one or more processors 16, a memory 28, and a bus 18 that connects the various system components (including the memory 28 and the processors 16).
Bus 18 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, micro-channel architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.
Computer device 12 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by computer device 12 and includes both volatile and nonvolatile media, removable and non-removable media.
The memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM)30 and/or cache memory 32. Computer device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 5, and commonly referred to as a "hard drive"). Although not shown in FIG. 5, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to bus 18 by one or more data media interfaces. Memory 28 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the invention.
A program/utility 40 having a set (at least one) of program modules 42 may be stored, for example, in memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment. Program modules 42 generally carry out the functions and/or methodologies of the described embodiments of the invention.
Computer device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), with one or more devices that enable a user to interact with computer device 12, and/or with any devices (e.g., network card, modem, etc.) that enable computer device 12 to communicate with one or more other computing devices. Such communication may be through an input/output (I/O) interface 22. Also, computer device 12 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the Internet) via network adapter 20. As shown, network adapter 20 communicates with the other modules of computer device 12 via bus 18. It should be appreciated that although not shown in FIG. 5, other hardware and/or software modules may be used in conjunction with computer device 12, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
The processor 16 executes various functional applications and data processing by running the program stored in the memory 28, thereby implementing the domain name recognition method provided by the embodiment of the present invention: determining sub-domain name data in the domain name data to be identified; acquiring at least one sub-domain name feature of the sub-domain name data, and matching each sub-domain name feature with a target feature condition to obtain a sub-domain name feature matching result; wherein the target feature condition is determined according to a sub-domain name level threshold and feature structure data; and under the condition that the matching result of the sub-domain name features is determined to be successful, determining the domain name data to be identified as target domain name data.
EXAMPLE five
Fifth, an embodiment of the present invention provides a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the method for identifying a domain name provided in the embodiment of the present invention is implemented: determining sub-domain name data in the domain name data to be identified; acquiring at least one sub-domain name feature of the sub-domain name data, and matching each sub-domain name feature with a target feature condition to obtain a sub-domain name feature matching result; wherein the target feature condition is determined according to a sub-domain name level threshold and feature structure data; and under the condition that the matching result of the sub-domain name features is determined to be successful, determining the domain name data to be identified as target domain name data.
Any combination of one or more computer-readable media may be employed. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or computer device. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (10)

1. A method for identifying a domain name, comprising:
determining sub-domain name data in the domain name data to be identified;
acquiring at least one sub-domain name feature of the sub-domain name data, and matching each sub-domain name feature with a target feature condition to obtain a sub-domain name feature matching result; wherein the target feature condition is determined according to a sub-domain name level threshold and feature structure data;
and under the condition that the matching result of the sub-domain name features is determined to be successful, determining the domain name data to be identified as target domain name data.
2. The method according to claim 1, wherein the obtaining at least one sub-domain name feature of the sub-domain name data and matching each sub-domain name feature with a target feature condition to obtain a sub-domain name feature matching result comprises:
acquiring the level quantity characteristics of the sub-domain name data;
under the condition that the level quantity of the sub domain name data is determined to be larger than or equal to the sub domain name level threshold value according to the level quantity characteristic, at least one level content data of the sub domain name data is obtained;
and under the condition that the feature structure data is determined to be included in any one of the levels of content data, determining that the sub-domain name feature matching result is successful in matching.
3. The method according to claim 2, wherein the obtaining the level number characteristic of the sub-domain name data comprises:
identifying level separation data in the sub-domain name data;
in an instance in which it is determined that the level separation data in the sub-domain name data is greater than or equal to a separation number threshold, determining the level number characteristic as the level number being greater than or equal to the sub-domain name level threshold; wherein the number of partitions threshold is predetermined according to the sub-domain name level threshold.
4. The method according to claim 2, wherein the obtaining at least one level of content data of the sub-domain name data comprises:
acquiring the level content data of the terminal level of the sub domain name data;
repeatedly executing the step of obtaining the level content data of the previous level under the condition that the feature structure data is not included in the level content data until the feature structure data is included in the level content data or all the level content data is determined not to include the feature structure data.
5. The method according to claim 2, wherein after the obtaining at least one sub-domain name feature of the sub-domain name data and matching each sub-domain name feature with a target feature condition to obtain a sub-domain name feature matching result, further comprising:
under the condition that the level number is smaller than the sub-domain name level threshold value, determining that the sub-domain name feature matching result is matching failure, and acquiring main domain name data in the domain name data to be identified;
and under the condition that the main domain name data is determined to have the target main domain name characteristics, determining the domain name data to be identified as the target domain name data.
6. The method of claim 5, wherein the determining that the primary domain name data has a target primary domain name characteristic comprises:
acquiring a preset number of target position content characters in the main domain name data according to the characteristic content data;
determining that the main domain name data has the target main domain name characteristic if it is determined that the target location content character belongs to the characteristic content data.
7. The method according to claim 1, wherein the determining the sub-domain name data in the domain name data to be identified comprises:
identifying tail end root domain name data in the domain name data to be identified, and determining root domain name separation data of the tail end root domain name data;
determining main domain name data in the domain name data to be identified and main domain name separation data of the main domain name data according to the tail end root domain name data and the root domain name separation data; wherein the root domain name separation data is used for separating the tail end root domain name data from the main domain name data;
determining the sub-domain name data in the domain name data to be identified according to the main domain name data and the main domain name separation data; the main domain name separation data is used for separating the main domain name data from the sub domain name data.
8. A domain name recognition apparatus, comprising:
the sub-domain name determining module is used for determining sub-domain name data in the domain name data to be identified;
the characteristic matching module is used for acquiring at least one sub-domain name characteristic of the sub-domain name data and matching each sub-domain name characteristic with a target characteristic condition to obtain a sub-domain name characteristic matching result; wherein the target feature condition is determined according to a sub-domain name level threshold and feature structure data;
and the first determining module is used for determining the domain name data to be identified as the target domain name data under the condition that the matching result of the characteristics of the sub domain names is determined to be successful.
9. A computer device, characterized in that the computer device comprises:
one or more processors;
storage means for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement a domain name recognition method as claimed in any one of claims 1-7.
10. A computer storage medium on which a computer program is stored, which program, when being executed by a processor, carries out a domain name recognition method according to any one of claims 1 to 7.
CN202111672272.2A 2021-12-31 2021-12-31 Domain name identification method, device, equipment and storage medium Active CN114363290B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111672272.2A CN114363290B (en) 2021-12-31 2021-12-31 Domain name identification method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111672272.2A CN114363290B (en) 2021-12-31 2021-12-31 Domain name identification method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN114363290A true CN114363290A (en) 2022-04-15
CN114363290B CN114363290B (en) 2023-08-29

Family

ID=81104856

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111672272.2A Active CN114363290B (en) 2021-12-31 2021-12-31 Domain name identification method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114363290B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115361358A (en) * 2022-08-19 2022-11-18 山石网科通信技术股份有限公司 IP extraction method, device, storage medium and electronic device

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150142608A1 (en) * 2013-11-18 2015-05-21 Andrew Horn System and method for identifying domain names
GB2555801A (en) * 2016-11-09 2018-05-16 F Secure Corp Identifying fraudulent and malicious websites, domain and subdomain names
CN109450886A (en) * 2018-10-30 2019-03-08 杭州安恒信息技术股份有限公司 A kind of domain name recognition methods, system and electronic equipment and storage medium
CN110008705A (en) * 2019-04-15 2019-07-12 北京微步在线科技有限公司 A kind of recognition methods of malice domain name, device and electronic equipment based on deep learning
CN110674370A (en) * 2019-09-23 2020-01-10 鹏城实验室 Domain name identification method and device, storage medium and electronic equipment
CN111800404A (en) * 2020-06-29 2020-10-20 深信服科技股份有限公司 Method and device for identifying malicious domain name and storage medium
CN111818198A (en) * 2020-09-10 2020-10-23 腾讯科技(深圳)有限公司 Domain name detection method, domain name detection device, equipment and medium
CN113329035A (en) * 2021-06-29 2021-08-31 深信服科技股份有限公司 Method and device for detecting attack domain name, electronic equipment and storage medium
CN113691489A (en) * 2020-05-19 2021-11-23 北京观成科技有限公司 Malicious domain name detection feature processing method and device and electronic equipment

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150142608A1 (en) * 2013-11-18 2015-05-21 Andrew Horn System and method for identifying domain names
GB2555801A (en) * 2016-11-09 2018-05-16 F Secure Corp Identifying fraudulent and malicious websites, domain and subdomain names
CN109450886A (en) * 2018-10-30 2019-03-08 杭州安恒信息技术股份有限公司 A kind of domain name recognition methods, system and electronic equipment and storage medium
CN110008705A (en) * 2019-04-15 2019-07-12 北京微步在线科技有限公司 A kind of recognition methods of malice domain name, device and electronic equipment based on deep learning
CN110674370A (en) * 2019-09-23 2020-01-10 鹏城实验室 Domain name identification method and device, storage medium and electronic equipment
CN113691489A (en) * 2020-05-19 2021-11-23 北京观成科技有限公司 Malicious domain name detection feature processing method and device and electronic equipment
CN111800404A (en) * 2020-06-29 2020-10-20 深信服科技股份有限公司 Method and device for identifying malicious domain name and storage medium
CN111818198A (en) * 2020-09-10 2020-10-23 腾讯科技(深圳)有限公司 Domain name detection method, domain name detection device, equipment and medium
CN113329035A (en) * 2021-06-29 2021-08-31 深信服科技股份有限公司 Method and device for detecting attack domain name, electronic equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ZHOUYU BAO;WENBO WANG;YUQING LAN: "Using Passive DNS to Detect Malicious Domain Name", 《CVISP 2019: PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON VISION, IMAGE AND SIGNAL PROCESSING》 *
殷聪贤: "基于大数据分析的恶意域名检测技术研究与实现", 《中国优秀硕士学位论文全文数据库(信息科技辑)》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115361358A (en) * 2022-08-19 2022-11-18 山石网科通信技术股份有限公司 IP extraction method, device, storage medium and electronic device
CN115361358B (en) * 2022-08-19 2024-02-06 山石网科通信技术股份有限公司 IP extraction method and device, storage medium and electronic device

Also Published As

Publication number Publication date
CN114363290B (en) 2023-08-29

Similar Documents

Publication Publication Date Title
JP6126672B2 (en) Malicious code detection method and system
CN112738102B (en) Asset identification method, device, equipment and storage medium
CN112866023A (en) Network detection method, model training method, device, equipment and storage medium
CN114363290B (en) Domain name identification method, device, equipment and storage medium
US10664594B2 (en) Accelerated code injection detection using operating system controlled memory attributes
CN114691161A (en) Key-Value-based software system configuration method and device and electronic equipment
CN114189390A (en) Domain name detection method, system, equipment and computer readable storage medium
CN112214770B (en) Malicious sample identification method, device, computing equipment and medium
US11563717B2 (en) Generation method, generation device, and recording medium
CN109068170B (en) Storage method, device, terminal and storage medium for barrage message
CN114116811B (en) Log processing method, device, equipment and storage medium
CN115643044A (en) Data processing method, device, server and storage medium
CN113010885B (en) Method and device for detecting kernel thread disguised with start address
CN112866005B (en) Method, device and equipment for processing user access log and storage medium
US8219667B2 (en) Automated identification of computing system resources based on computing resource DNA
CN113627535A (en) Data grading classification system and method based on data security and privacy protection
CN113420302A (en) Host vulnerability detection method and device
CN111510457A (en) Function attack detection method and device, electronic equipment and readable medium
CN111953813A (en) IP address identification method, system, electronic device and storage medium
CN113609352B (en) Character string retrieval method, device, computer equipment and storage medium
CN109902176B (en) Data association expansion method and non-transitory computer instruction storage medium
CN110134691B (en) Data verification method, device, equipment and medium
CN115022011A (en) Method, device, equipment and medium for identifying missed scanning software access request
CN117150123A (en) Resource allocation method and system based on cloud computing
CN117201149A (en) Data access method and system based on cloud computing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant