Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a data processing method and a system for network security operation based on big data, which solve the problems in the background art.
In order to achieve the above purpose, the invention is realized by the following technical scheme: the system comprises a data processing system for network security operation based on big data, a flow acquisition module, an IP acquisition module, a data analysis module, a detection module, an evaluation module and a response module;
The flow acquisition module monitors network flow information to acquire real-time network flow data and form a flow data set;
The IP acquisition module monitors network communication content to acquire real-time IP address related information to form an IP address data set;
The data analysis module preprocesses the flow data set and the IP address data set to obtain flow related information, forms a first data set, obtains IP address related information, forms a second data set, matches the flow related information with the IP address related information, obtains one-to-many or one-to-one relation matching quantity of the IP addresses, and records the relation matching quantity to form a third data set;
the monitoring module performs normalization processing on the first data set, the second data set and the third data set, performs calculation, and obtains: abnormality index Yczs;
the abnormality index Yczs is obtained by the following formula:
;
Wherein Llxs denotes a network traffic coefficient, fwxs denotes an address access coefficient, a and B denote a proportionality coefficient of the network traffic coefficient Llxs and the address access coefficient Fwxs, respectively, and C denotes a first correction constant;
The network flow coefficient Llxs is obtained through calculation of the first data set and the third data set, and is compared with a preset network flow threshold L to obtain a network flow evaluation scheme;
the address access coefficient Fwxs is obtained through calculation of the second data set and the third data set, and is compared with a preset address access threshold Z to obtain an address access evaluation scheme;
the evaluation module is matched with an abnormality warning threshold Y and an abnormality index Yczs to obtain an abnormality grade strategy scheme;
and the response module performs specific execution through the content of the abnormal level strategy scheme.
Preferably, the flow acquisition module comprises a monitoring unit and an integration unit;
The monitoring unit is used for monitoring data packets in network communication in real time, extracting relevant information in the traffic packets, including source addresses, target addresses, data packet lengths, time stamps, the number of the data packets, loads, service types and port numbers, and marking the acquired information;
the integration unit integrates the marked flow packet information to form a structured data form so as to form a flow data set.
Preferably, the IP acquisition module includes a classification unit and an extraction unit;
The classifying unit classifies the protocol field and the data packet type in the protocol network communication content to obtain the IP address related information, wherein the protocol field comprises: TCP, UDP, and ICMP, packet types include: HTTP, FTP, and DNS;
The extracting unit extracts the classified IP address related information to obtain: the source IP address, the target IP address, the number of active times, the time stamp information, the transmission rate, the data packet quantity value and the connection duration form an IP address data group.
Preferably, the data analysis module comprises a flow processing unit, an IP processing unit and a matching association unit;
The flow processing unit performs checksum preprocessing on the flow data set to obtain flow packet related information, and the flow processing unit forms a first data set, and comprises: packet length Sjb, traffic transmission duration Llsc, load Sjfz, and traffic packet number Llb;
The IP processing unit performs checksum preprocessing on the IP address data set to acquire IP address related information, and forms a second data set, and the method comprises the following steps: address packet Dzb, connection duration Ljsc, and number of activations Hycs;
The source address and the target address in the flow data set and the IP address data set of the matching association unit are matched, and an association relation between the flow and the IP address is established and marked, wherein the association relation comprises the following steps: one-to-many and one-to-one, recording the number of incidence relations of the marks, and integrating the number of incidence relations and the frequency value of occurrence of the same IP address to form a third data set, wherein the third data set comprises: IP traffic relation number Gxsl and IP traffic frequency value Plz.
Preferably, the monitoring module comprises a normalization unit and a calculation unit;
The normalization unit performs normalization processing on the first data set, the second data set and the third data set to enable the first data set, the second data set and the third data set to be in the same dimension;
The computing unit performs first computation on the normalized first data set, the normalized second data set and the normalized third data set to obtain: network traffic coefficient Llxs and address access coefficient Fwxs, and then performing a second calculation to obtain: abnormality index Yczs.
Preferably, the network traffic coefficient Llxs is obtained by the following formula:
;
wherein, the absolute ratio value of the data packet length Sjb to the flow transmission duration Llsc, the absolute ratio value of the load Sjfz to the flow transmission duration Llsc, the absolute ratio value of the flow data packet number Llb to the flow transmission duration Llsc are calculated, and then the calculated result is compared with the calculated result of the IP flow relation number Gxsl and the IP flow frequency value Plz to obtain a network flow coefficient Llxs, E represents a second correction constant;
And, the network flow coefficient Llxs is compared with a preset network flow threshold L to obtain a network flow evaluation scheme:
The network flow coefficient Llxs is smaller than the network flow threshold L, and the network flow transmission is abnormal;
the network flow coefficient Llxs is more than or equal to the network flow threshold L, and the network flow transmission is abnormal, and the full checking and killing, the deleting of abnormal flow packets and the marking of abnormal flow packets are carried out.
Preferably, the address access coefficient Fwxs is obtained by:
;
In the formula, the absolute ratio value of address data packet Dzb to connection duration Ljsc, the absolute ratio value of the number of activations Hycs to connection duration Ljsc are calculated, and then compared with the calculated results of IP flow relation number Gxsl and IP flow frequency value Plz to obtain address access coefficient Fwxs, wherein F represents a third correction constant;
And, address access coefficient Fwxs is compared with a preset address access threshold Z to obtain an address access evaluation scheme:
address access coefficient Fwxs is less than address access threshold Z, and access address is not abnormal;
The address access coefficient Fwxs is more than or equal to the address access threshold Z, the access address is abnormal, the abnormal address is marked and a blacklist is added, and meanwhile, the address of the IP section is set to be the upper limit of the flow.
Preferably, the evaluation module comprises a threshold storage unit and a matching unit;
The threshold storage unit is used for storing an abnormal alert threshold Y, a network flow threshold L, an address access threshold Z, an abnormal level policy scheme, a network flow evaluation scheme, an address access evaluation scheme and the contact modes of related notification personnel;
the matching unit compares the abnormality index Yczs with an abnormality alert threshold Y to obtain an abnormality level policy scheme:
the abnormality index Yczs is smaller than the abnormality warning threshold Y, and the traffic packet and the source IP address are not abnormal;
The abnormality index Yczs is more than or equal to the abnormality warning threshold Y, abnormality exists in the flow packet and the source IP address, virus searching and killing are carried out on the flow packet, a blacklist is added to the source IP address contained in the flow packet, the upper flow limit is increased through the IP section, and meanwhile staff is informed to trace or search and kill the flow packet.
Preferably, the response module comprises an execution unit and a recording unit;
The execution unit executes corresponding predefined operations and informs related personnel according to specific protective measures in the abnormal level strategy scheme content, wherein the predefined operations comprise: blocking the source IP of attack, increasing the defending level, triggering alarm notification and limiting the upper and lower limits of traffic, wherein the notification modes comprise: broadcasting, short messages, presetting call recording and internal application communication;
The recording unit records log information generated in the execution process and is used for post audit and analysis, and the recorded information comprises: secure operation, execution time, execution result, and notification personnel.
A data processing method for network security operation based on big data comprises the following steps:
Step one: acquiring real-time network flow data through a flow acquisition module to form a flow data set;
Step two: acquiring real-time IP address related information through an IP acquisition module to form an IP address data set;
Step three: preprocessing a flow data set and an IP address data set through a data analysis module to obtain a first data set, a second data set and a third data set;
Step four: and carrying out normalization processing on the first data set, the second data set and the third data set through a monitoring module, and calculating to obtain: abnormality index Yczs;
Step five: matching the abnormality index Yczs with a preset abnormality alert threshold Y through an evaluation module to obtain an abnormality level policy scheme;
Step six: and the response module is used for executing the specific execution of the abnormal level strategy scheme content.
The invention provides a data processing method and a system for network security operation based on big data, which have the following beneficial effects:
(1) When the system operates, real-time network flow data and IP address related information are acquired through a flow acquisition module and an IP acquisition module, a flow data set and an IP address data set are formed, preprocessing is performed through a data analysis module, flow related information is acquired, a first data set is formed, IP address related information is acquired, a second data set is formed, the flow related information and the IP address related information are matched through the data analysis module, one-to-many or one-to-one relation matching quantity of the IP addresses is acquired, and is recorded, a third data set is formed, and the first data set, the second data set and the third data set are calculated through a monitoring module, so that the system is obtained: the anomaly index Yczs is matched with a preset anomaly alert threshold Y to obtain an anomaly level strategy scheme, and finally, the content of the anomaly level strategy scheme is specifically executed through the response module, so that under the condition that flow anomalies and IP anomalies are correlated with each other, real-time protection measures can be effectively monitored, and through more comprehensive data analysis and multidimensional matching, the perception level of network threats is improved, and the overall level of network security operation is improved.
(2) By judging whether the network traffic transmission is abnormal, the real-time monitoring of the network traffic state is realized, the abnormal detection precision of the network traffic transmission and the identification of various abnormal states are improved, the access and the times of the abnormal addresses are more flexibly identified through the detection of the related information of the IP addresses, the control of the abnormal addresses is further enhanced through the modes of marking and adding a blacklist and setting upper and lower limits, finally, a comprehensive abnormal level strategy scheme is provided, the secondary detection of the overall network traffic and the IP is carried out, and meanwhile, the automatic predefined protection measures are provided, so that the diffusion and the occurrence of the abnormal states are controlled at the first time.
(3) In the invention, the real-time network flow data is acquired to form a flow data set, the real-time IP address related information is formed into an IP address data set, the IP address data set is preprocessed to form a first data set and a second data set, the matching number of the IP address relation is acquired, the matching number is recorded, a third data set is formed, and the calculation is performed to obtain: the anomaly index Yczs is matched with a preset anomaly warning threshold Y at the same time, an anomaly level strategy scheme is obtained, specific execution is carried out according to the content of the anomaly level strategy scheme, the flow related information and the IP address related information are matched, and the generated anomaly level strategy scheme provides guidelines for subsequent response, so that the purposes of improving network safety and property safety are achieved, the use condition of manpower and material resource is reduced, and the processing efficiency of network safety information is improved.
Detailed Description
The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, but not all embodiments, and all other embodiments obtained by those skilled in the art without making any inventive effort based on the embodiments of the present invention are within the scope of protection of the present invention.
Network security operation is one of important fields in the brand-new angle along with the rapid development of the Internet, in the wide field, the application of big data technology gradually becomes an indispensable tool for solving network threat and protecting information security, and the application of big data in the network security field is not only for storing and processing massive data, but also for providing comprehensive network security situation awareness by deeply analyzing network traffic, user behaviors and threat information.
However, in conventional network security operations, monitoring and responding to abnormal traffic and IP are often limited and insufficient, and conventional methods often rely on limited data and simple rules, so that potential threats contained in a large amount of network data cannot be fully mined, and in particular, when a situation that traffic abnormality and IP abnormality are correlated is faced, conventional means are worry, for example, a network attack confuses conventional detection rules by simulating normal traffic, so that conventional methods cannot quickly and accurately identify abnormal requests and IP visitors, and such situations cause network security operations to lack flexibility and instantaneity when facing evolving network threats.
Examples
The invention provides a data processing system for network security operation based on big data, referring to fig. 1, a flow acquisition module, an IP acquisition module, a data analysis module, a detection module, an evaluation module and a response module;
The flow acquisition module monitors network flow information to acquire real-time network flow data and form a flow data set;
The IP acquisition module monitors network communication content to acquire real-time IP address related information to form an IP address data set;
The data analysis module preprocesses the flow data set and the IP address data set to obtain flow related information, forms a first data set, obtains IP address related information, forms a second data set, matches the flow related information with the IP address related information, obtains one-to-many or one-to-one relation matching quantity of the IP addresses, and records the relation matching quantity to form a third data set;
the monitoring module performs normalization processing on the first data set, the second data set and the third data set, performs calculation, and obtains: abnormality index Yczs;
the abnormality index Yczs is obtained by the following formula:
;
Wherein Llxs denotes a network traffic coefficient, fwxs denotes an address access coefficient, a and B denote a proportionality coefficient of the network traffic coefficient Llxs and the address access coefficient Fwxs, respectively, and C denotes a first correction constant;
wherein ,/>, and,/> ;
The network flow coefficient Llxs is obtained through calculation of the first data set and the third data set, and is compared with a preset network flow threshold L to obtain a network flow evaluation scheme;
the address access coefficient Fwxs is obtained through calculation of the second data set and the third data set, and is compared with a preset address access threshold Z to obtain an address access evaluation scheme;
the evaluation module is matched with an abnormality warning threshold Y and an abnormality index Yczs to obtain an abnormality grade strategy scheme;
and the response module performs specific execution through the content of the abnormal level strategy scheme.
In this embodiment, real-time network traffic data and IP address related information are collected through a traffic collection module and an IP collection module, a traffic data set and an IP address data set are formed, preprocessing is performed through a data analysis module, traffic related information is obtained, a first data set is formed, IP address related information is obtained, a second data set is formed, the traffic related information and the IP address related information are matched through the data analysis module, one-to-many or one-to-one relationship matching number of the IP addresses is obtained, and recording is performed, a third data set is formed, and the first data set, the second data set and the third data set are calculated through a monitoring module, so that: the anomaly index Yczs is matched with a preset anomaly alert threshold Y to obtain an anomaly level strategy scheme, and finally, the content of the anomaly level strategy scheme is specifically executed through the response module, so that under the condition that flow anomalies and IP anomalies are correlated with each other, real-time protection measures can be effectively monitored, and through more comprehensive data analysis and multidimensional matching, the perception level of network threats is improved, and the overall level of network security operation is improved.
Example 2
This embodiment is explained in embodiment 1, please refer to fig. 1, specifically: the flow acquisition module comprises a monitoring unit and an integration unit;
The monitoring unit is used for monitoring data packets in network communication in real time, extracting relevant information in the traffic packets, including source addresses, target addresses, data packet lengths, time stamps, the number of the data packets, loads, service types and port numbers, and marking the acquired information;
the integration unit integrates the marked flow packet information to form a structured data form so as to form a flow data set.
The IP acquisition module comprises a classification unit and an extraction unit;
The classifying unit classifies the protocol field and the data packet type in the protocol network communication content to obtain the IP address related information, wherein the protocol field comprises: TCP, UDP, and ICMP, packet types include: HTTP, FTP, and DNS;
The extracting unit extracts the classified IP address related information to obtain: the source IP address, the target IP address, the number of active times, the time stamp information, the transmission rate, the data packet quantity value and the connection duration form an IP address data group.
The data analysis module comprises a flow processing unit, an IP processing unit and a matching association unit;
The flow processing unit performs checksum preprocessing on the flow data set to obtain flow packet related information, and the flow processing unit forms a first data set, and comprises: packet length Sjb, traffic transmission duration Llsc, load Sjfz, and traffic packet number Llb;
The IP processing unit performs checksum preprocessing on the IP address data set to acquire IP address related information, and forms a second data set, and the method comprises the following steps: address packet Dzb, connection duration Ljsc, and number of activations Hycs;
Number of activations Hycs: the number of active communication times of the IP address in a fixed period is represented, and the interaction frequency of the IP address and the system is reflected;
The source address and the target address in the flow data set and the IP address data set of the matching association unit are matched, and an association relation between the flow and the IP address is established and marked, wherein the association relation comprises the following steps: one-to-many and one-to-one, recording the number of incidence relations of the marks, and integrating the number of incidence relations and the frequency value of occurrence of the same IP address to form a third data set, wherein the third data set comprises: IP traffic relation number Gxsl and IP traffic frequency value Plz, wherein the IP addresses are recorded only when three or more associated traffic packets appear at the same IP address.
The monitoring module comprises a normalization unit and a calculation unit;
The normalization unit performs normalization processing on the first data set, the second data set and the third data set to enable the first data set, the second data set and the third data set to be in the same dimension;
The computing unit performs first computation on the normalized first data set, the normalized second data set and the normalized third data set to obtain: network traffic coefficient Llxs and address access coefficient Fwxs, and then performing a second calculation to obtain: abnormality index Yczs.
Example 3
This embodiment is explained in embodiment 1, please refer to fig. 1, specifically: the network flow coefficient Llxs is obtained by the following formula:
;
wherein, the absolute ratio value of the data packet length Sjb to the flow transmission duration Llsc, the absolute ratio value of the load Sjfz to the flow transmission duration Llsc, the absolute ratio value of the flow data packet number Llb to the flow transmission duration Llsc are calculated, and then the calculated result is compared with the calculated result of the IP flow relation number Gxsl and the IP flow frequency value Plz to obtain a network flow coefficient Llxs, E represents a second correction constant;
And, the network flow coefficient Llxs is compared with a preset network flow threshold L to obtain a network flow evaluation scheme:
The network flow coefficient Llxs is smaller than the network flow threshold L, and the network flow transmission is abnormal;
the network flow coefficient Llxs is more than or equal to the network flow threshold L, and the network flow transmission is abnormal, and the full checking and killing, the deleting of abnormal flow packets and the marking of abnormal flow packets are carried out.
The address access coefficient Fwxs is obtained by:
;
In the formula, the absolute ratio value of address data packet Dzb to connection duration Ljsc, the absolute ratio value of the number of activations Hycs to connection duration Ljsc are calculated, and then compared with the calculated results of IP flow relation number Gxsl and IP flow frequency value Plz to obtain address access coefficient Fwxs, wherein F represents a third correction constant;
And, address access coefficient Fwxs is compared with a preset address access threshold Z to obtain an address access evaluation scheme:
address access coefficient Fwxs is less than address access threshold Z, and access address is not abnormal;
The address access coefficient Fwxs is more than or equal to the address access threshold Z, the access address is abnormal, the abnormal address is marked and a blacklist is added, and meanwhile, the address of the IP section is set to be the upper limit of the flow.
The evaluation module comprises a threshold storage unit and a matching unit;
The threshold storage unit is used for storing an abnormal alert threshold Y, a network flow threshold L, an address access threshold Z, an abnormal level policy scheme, a network flow evaluation scheme, an address access evaluation scheme and the contact modes of related notification personnel;
the matching unit compares the abnormality index Yczs with an abnormality alert threshold Y to obtain an abnormality level policy scheme:
the abnormality index Yczs is smaller than the abnormality warning threshold Y, and the traffic packet and the source IP address are not abnormal;
The abnormality index Yczs is more than or equal to an abnormality warning threshold Y, abnormality exists in the flow packet and the source IP address, virus searching and killing are carried out on the flow packet, a blacklist is added to the source IP address contained in the flow packet, the upper flow limit is increased by the IP section, and meanwhile staff is informed to trace or search and kill the flow packet;
virus checking and killing: carrying out deep scanning on the abnormal flow packet, detecting whether the abnormal flow packet contains malicious software or virus, and immediately carrying out virus searching and killing operation to prevent the spread of the malicious code;
blacklist addition: the method comprises the steps that a source IP address of traffic abnormality is marked and added to a blacklist, and the traffic abnormality is prevented from continuously accessing a system, so that potential security threat is reduced;
The upper limit of the IP section flow increases: and for the abnormal IP section, the traffic upper limit is increased so as to better manage and isolate potential attacks and ensure the normal operation of the network.
The response module comprises an execution unit and a recording unit;
The execution unit executes corresponding predefined operations and informs related personnel according to specific protective measures in the abnormal level strategy scheme content, wherein the predefined operations comprise: blocking the source IP of attack, increasing the defending level, triggering alarm notification and limiting the upper and lower limits of traffic, wherein the notification modes comprise: broadcasting, short messages, presetting call recording and internal application communication;
The recording unit records log information generated in the execution process and is used for post audit and analysis, and the recorded information comprises: safe operation, execution time, execution result and personnel notification;
And (3) safety operation record: recording specific security operations executed each time, wherein the specific security operations comprise specific protection measures taken and specific processing steps aiming at abnormality;
Performing time recording: recording the execution time of each operation, including the start time and the end time, helps to analyze the safety conditions of the system during different time periods, as well as abnormal conditions occurring at certain points in time;
And (3) recording an execution result: the execution results of the safety operation are recorded in detail, including whether the operation is successful or not and whether abnormal conditions occur or not, and the records provide real-time feedback on the running condition of the system for system maintenance personnel.
In this embodiment, by judging whether the network traffic transmission is abnormal, the real-time monitoring of the network traffic state is realized, the abnormality detection precision of the network traffic transmission and the recognition of multiple abnormal states are improved, the access and the number of times of identifying the abnormal address are more flexibly achieved through the detection of the related information of the IP address, the control of the abnormal address is further enhanced through the ways of marking and adding a blacklist and setting upper and lower limits, and finally, a comprehensive abnormal grade strategy scheme is provided to carry out the secondary detection of the global network traffic and the IP, and meanwhile, an automatic predefined safeguard measure is provided, so that the diffusion and the occurrence of the abnormal state are controlled at the first time.
Example 4
Referring to fig. 2, a specific method for processing data based on network security operation of big data is shown: the method comprises the following steps:
Step one: acquiring real-time network flow data through a flow acquisition module to form a flow data set;
Step two: acquiring real-time IP address related information through an IP acquisition module to form an IP address data set;
Step three: preprocessing a flow data set and an IP address data set through a data analysis module to obtain flow related information, forming a first data set, obtaining IP address related information, forming a second data set, matching the flow related information and the IP address related information, obtaining one-to-many or one-to-one relation matching quantity of the IP addresses, and recording to form a third data set;
Step four: and carrying out normalization processing on the first data set, the second data set and the third data set through a monitoring module, and calculating to obtain: abnormality index Yczs;
Step five: matching the abnormality index Yczs with a preset abnormality alert threshold Y through an evaluation module to obtain an abnormality level policy scheme;
Step six: and the response module is used for executing the specific execution of the abnormal level strategy scheme content.
In this embodiment, by acquiring real-time network traffic data, forming a traffic data set, real-time IP address related information, forming an IP address data set, preprocessing, forming a first data set and a second data set, =acquiring the matching number of IP address relationships, recording, forming a third data set, and calculating to acquire: the anomaly index Yczs is matched with a preset anomaly warning threshold Y at the same time, an anomaly level strategy scheme is obtained, specific execution is carried out according to the content of the anomaly level strategy scheme, the flow related information and the IP address related information are matched, and the generated anomaly level strategy scheme provides guidelines for subsequent response, so that the purposes of improving network safety and property safety are achieved, the use condition of manpower and material resource is reduced, and the processing efficiency of network safety information is improved.
Specific examples: a data processing system for big data based network security operations used by a certain security operator will use specific parameters and values to demonstrate how to calculate: abnormality index Yczs, network traffic coefficient Llxs, and address access coefficient Fwxs;
it is assumed that the following parameters are owned:
A first data set: packet length Sjb: 1200. traffic transmission duration Llsc: 300. load Sjfz:50 and number of traffic packets Llb:150;
A second data set: address packet Dzb: 500. connection duration Ljsc:150 and number of activations Hycs:300;
third data set: IP traffic relation number Gxsl:3 and IP traffic frequency value Pl:1, a step of;
Correction constant E:0.8;
Obtaining according to a calculation formula of the network flow coefficient Llxs:
Llxs=(|1200/300|+|50/300|+|150/300|)/(3+1)+0.8=2;
Setting a network traffic threshold L to 5, comparing with a network traffic coefficient Llxs, and obtaining: the network flow coefficient Llxs is smaller than the network flow threshold L, and the network flow transmission is abnormal;
correction constant F:0.7;
Fwxs=(|500/150|+|300/150|)/(3+1)=2;
Setting the address access threshold Z to be 5, comparing with the network flow coefficient Llxs, and obtaining: address access coefficient Fwxs is less than address access threshold Z, and access address is not abnormal;
scaling factor a:0.47, scaling factor B:0.48, correction constant C:0.1;
Obtaining according to a calculation formula of the abnormality index Yczs:
Yczs=[(0.47*2)+(0.48*2)]+0.1=2;
Setting the anomaly alert threshold value Y to 5, comparing with the anomaly index Ycz, and obtaining: abnormality index Yczs < abnormality alert threshold Y, traffic packets and source IP address are not abnormal.
Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.