Disclosure of Invention
Aiming at the defects of the prior art, the application provides an asset data fusion and mapping analysis method and system in an industrial control network environment.
The application provides an asset data fusion and mapping analysis method in an industrial control network environment, which comprises the following steps:
Acquiring and preprocessing asset data of all terminals in an industrial control Internet environment;
Analyzing the resource node data and the access relation data from the preprocessed asset data according to the asset type, the attribution relation and the action record;
Importing the asset node data and the access relation data into a graph database, and simultaneously making asset fusion rules to perform asset data fusion to obtain an asset data fusion result;
And carrying out map visual display on the asset data fusion result on a data canvas through a map canvas interface, and simultaneously providing a query interface to carry out data query analysis on the basis of mapping.
In some embodiments, the collecting and preprocessing asset data of all terminals in the industrial control internet environment includes:
The asset data from different industrial control equipment sources are subjected to preliminary filtration in a unified message queue consumption mode to the data with the missing key attributes and the repeated key attributes;
and according to the data matching format difference of different asset types, distinguishing the data to different consumption queues for subsequent parallel task processing.
In some embodiments, the analyzing asset node data, asset relationship data, and asset tag attributes from the preprocessed asset data according to asset type, attribution relationship, and action records includes:
And according to the action record polling, processing data of different asset queues, acquiring asset node data and asset relation data, wherein the asset node data comprises a unique equipment identification code sn, a mac address, login credentials, equipment id, equipment ip, a host name, a geographic position and a network position, and the asset relation data comprises a source ip, a source port, a destination ip, a destination port, access time and a data source.
In some embodiments, the importing the asset node data and the access relationship data into the graph database and making an asset fusion rule to perform asset data fusion, to obtain an asset data fusion result, includes:
importing the asset node data and the access relation data into a graph database through a database writing statement;
firstly, matching equipment nodes of any two asset types with similar attributes from a graph database, wherein the attributes comprise an equipment unique identification code sn, an equipment id, an equipment ip, an equipment mac, a host name, a geographic position, a network position and login credentials;
Setting weight, namely formulating a corresponding weight value for the unique equipment identification code sn, equipment id, equipment ip, equipment mac, host name, geographic position, network position and login credentials according to the characteristics of the attribute, and representing the similarity contribution of the attribute to the equipment node;
calculating the similarity, namely judging whether the similarity values of the two equipment nodes on each attribute are identical or not through CASE sentences, if so, carrying out subsequent calculation by using the current weight value of the attribute, otherwise, calculating by taking the weight value as 0 until all the attributes are judged;
calculating a weighted sum, namely carrying out weighted summation on the weight values after all the attributes are judged to obtain the overall similarity of the two nodes, and carrying out weighted summation on the weight values before all the attributes are judged to obtain the total weight;
Screening similar nodes, namely obtaining a similar rate according to the overall similarity and the total weight, wherein the formula is that the overall similarity/the total weight=the similar rate, and judging that two equipment nodes are similar nodes if the similar rate is larger than a threshold value;
And merging the two similar nodes into one node, wherein in the merging process, if a plurality of nodes have different values on a certain attribute, a merging strategy is selected to merge the plurality of attribute values, so as to obtain an asset data merging result, and the merged node in the asset data merging result is returned as a result.
In some embodiments, the performing, through a graph canvas interface, the graph visualization of the asset data fusion result on a data canvas includes:
the front end initiates a request, namely when loading canvas on a Web front end interface, the front end sends a GET/api/graph request to the back end;
the back end processes the request, the back end inquires all the equipment nodes and the relation between the equipment nodes from the graph database;
the back end formats the query result of the graph database into a JSON format suitable for front end rendering, wherein the JSON format comprises node information and relation information;
The back end returns the nodes and the relation data to the front end, and the front end performs graphical display according to the data;
and the front end displays a chart, and the front end draws the graph according to the nodes and the relation data.
In some embodiments, the concurrently providing a query interface performs data query analysis on a mapped basis, including:
inputting a device unique identification code sn or a device ip of the device on the Web interface to inquire specific asset information;
The front end sends a GET/api/device/{ sn } or a GET/api/device/ip/{ ip } request to the back end;
The back end uses the unique identification code sn of the device or the device ip graph database to execute MATCH inquiry to obtain the detailed information of the device;
Acquiring detailed data of the equipment by inquiring the attribute of the equipment and the relation related to the equipment;
the back end returns the detailed information and relation of the equipment to the front end for display in a JSON format.
In a second aspect, the application provides a system comprising a data acquisition preprocessing module, an asset data preliminary analysis module, an asset data fusion module and a mapping analysis module;
The data acquisition preprocessing module is used for acquiring and preprocessing asset data of all terminals in an industrial control internet environment;
The asset data preliminary analysis module is used for analyzing the asset node data and the access relation data from the preprocessed asset data according to the asset type, the attribution relation and the action record;
The asset data fusion module is used for importing the asset node data and the access relation data into a graph database, and formulating asset fusion rules to perform asset data fusion to obtain an asset data fusion result;
The atlas analysis module is used for carrying out atlas visual display on the data canvas on the asset data fusion result through an atlas canvas interface, and simultaneously providing a query interface for carrying out data query analysis on the basis of atlas.
In some embodiments, a data cleansing interface is also included for providing an interface for external active cleansing of obsolete data.
In a third aspect the application proposes an electronic device comprising a memory, a processor and a computer program stored in said memory and executable on said processor, said processor implementing the steps of the method as described above when said computer program is executed.
In a fourth aspect the application proposes a computer-readable storage medium storing a computer program which, when executed by a processor, implements the steps of a method as described above.
The invention has the beneficial effects that:
The asset interaction data in the industrial control internet environment is stored and displayed by using the graph database technology, so that the following remarkable effects are realized:
1. the performance is improved, the data storage and query efficiency is greatly improved through the graphical model, and the method is suitable for large-data-volume and high-concurrency scenes.
2. And the depth analysis is convenient, the primary support map analysis is carried out, the complex relation mining process is simplified, and real-time and visual display of the relativity among the assets is realized.
3. And the duplicate removal fusion is effectively realized, so that the unique property nodes and various relations are ensured, and the data quality and consistency are improved.
4. Visualization enhancement, namely providing a dynamic and easily understood network view, which is beneficial to quickly identifying system risks and optimizing points.
5. The decision support is timely and accurate, a powerful tool is provided for industrial control network management and safety monitoring, and efficient operation and maintenance decision is promoted.
In general, the method and the system effectively solve the limitations of the traditional storage technology in the process of construction and display of dynamic behavior data of the assets in the industrial control network environment, improve the expandability and the intelligentization level of the whole system, and solve the problems of performance bottleneck, insufficient large data processing capacity, inconvenient map analysis and the like in the prior art when the flow data among the assets are stored by utilizing a relational database or a text mode.
Detailed Description
Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While the exemplary embodiments of the present invention have been illustrated in the drawings, it should be understood that the present invention may be embodied in various forms and should not be limited to the embodiments set forth herein, but rather, these embodiments are provided so that the present invention will be more thoroughly understood and will fully convey the scope of the invention to those skilled in the art.
In a first aspect, the present application proposes a method for asset data fusion and mapping analysis in an industrial control network environment, as shown in fig. 1, including the following steps:
s100, collecting and preprocessing asset data of all terminals in an industrial control internet environment;
In some embodiments, the collecting and preprocessing asset data of all terminals in the industrial control internet environment includes:
The asset data from different industrial control equipment sources are subjected to preliminary filtration in a unified message queue consumption mode to the data with the missing key attributes and the repeated key attributes;
and according to the data matching format difference of different asset types, distinguishing the data to different consumption queues for subsequent parallel task processing.
As shown in FIG. 2, the real-time capturing and centralized processing of the asset interaction data in the industrial control Internet environment are realized by using a Sysyslog system, a Kafka and other efficient message queue mechanisms. By configuring flexible and extensible data source interfaces, massive, multi-source and heterogeneous flow data can be received accurately and integrated preliminarily, for example, terminal equipment (such as PLC, sensor and gateway) in an industrial control network reports asset data through unified message queues (such as Kafka), and data fields comprise equipment SN, MAC addresses, IP, geographic positions and the like.
Filtering redundancy, namely removing records with missing or repeated key attributes (such as SN and IP), specifically filtering and accurately extracting the acquired information in real time, removing non-target attributes and redundant data, and focusing on retaining key asset dynamic behavior information. The process can utilize the powerful functions of an ETL tool (such as an open source tool DataX), support dynamic rule configuration and complex event processing, ensure that the preprocessing stage can meet the high throughput requirement, accurately identify and retain valuable asset relationship data, normalize formats, namely, convert heterogeneous data of different equipment types into uniform formats according to preset templates, and distribute the heterogeneous data into independent consumption queues according to asset types (such as network communication assets and safety protection assets) to realize parallel processing.
S200, analyzing the resource node data and the access relation data from the preprocessed asset data according to the asset type, the attribution relation and the action record;
in some embodiments, the analyzing asset node data, asset relationship data, and asset tag attributes from the preprocessed asset data according to asset type, attribution relationship, and action records includes:
And according to the action record polling, processing data of different asset queues, acquiring asset node data and asset relation data, wherein the asset node data comprises a unique equipment identification code sn, a mac address, login credentials, equipment id, equipment ip, a host name, a geographic position and a network position, and the asset relation data comprises a source ip, a source port, a destination ip, a destination port, access time and a data source.
Wherein the assets are classified into network communication assets, video monitoring assets, security protection assets and the like according to types, and the attribution relationship is classified into asset access relationship, asset attack relationship, asset management relationship and the like from data sources, and the asset node attribute is extracted from the preprocessed data, wherein the asset node attribute comprises a device unique identification (SN), a MAC address, a device ID, an IP address, a host name, a geographic position, a network position and login credentials (encryption storage);
S300, importing the asset node data and the access relation data into a graph database, and simultaneously making asset fusion rules to perform asset data fusion to obtain an asset data fusion result;
in this embodiment, a map database is Neo4j, service deployment is required to be performed on Neo4j before data is imported, and after deployment is completed, basic Cypher query is executed through a Neo4j browser, so that normal operation is ensured;
In some embodiments, the importing the asset node data and the access relationship data into the graph database and making an asset fusion rule to perform asset data fusion, to obtain an asset data fusion result, includes:
importing the asset node data and the access relation data into a graph database through a database writing statement;
the database writing statement of the asset node data is as follows:
Creation of an asset node (Device) containing Device information
CREATE (d:Device {
Sn, 'sn_12345',/device serial number
DeviceId 'device_001',// device ID
IP, '192.168.1.1',/IP address of device
MAC, '00:14:22:01:23:45',/MAC address of device
Hostname, 'device-hostname',// hostname of device
Location is 'machine room A',/device location
NetworkLocation 'network segment 1',// network location
Credentials credential information (e.g., password) for 'admin_password'// devices
})
RETURN d;// RETURN to the created device node
The write library statement for accessing the relational data is:
Two device nodes are/matched, sn_12345 and sn_67880, respectively
MATCH (a:Device {sn: 'sn_12345'}), (b:Device {sn: 'sn_67890'})
Creating an access relationship and setting a correlation attribute for the relationship
CREATE (a)-[r:ACCESS {
SourceIp, '192.168.1.1',/source IP
SourcePort:8080,// Source Port
DestinationIp, '192.168.1.2',/destination IP
DestinationPort:80,// destination port
ACCESSTIME: datetime ()// access time, current time
DataSource, 'log_file'// data source (e.g., log file)
}]->(b)
Two device nodes/return and access relationship thereof
RETURN a, r, b;
Firstly, matching equipment nodes of any two asset types with similar attributes from a graph database, wherein the attributes comprise an equipment unique identification code sn, an equipment id, an equipment ip, an equipment mac, a host name, a geographic position, a network position and login credentials;
suppose now that there are two device nodes:
n1 (equipment 1):sn="sn_12345", deviceId="dev_001", ip="192.168.1.1", mac="00:14:22:01:23:45", hostname="device1", location=" machine room a ", networkLocation =" network segment 1", credentials =" admin ";
n2 (equipment 2):sn="sn_12345", deviceId="dev_001", ip="192.168.1.2", mac="00:14:22:01:23:46", hostname="device2", location=" machine room a ", networkLocation =" network segment 2", credentials =" admin ";
Setting weight, namely formulating a corresponding weight value for the unique equipment identification code sn, equipment id, equipment ip, equipment mac, host name, geographic position, network position and login credentials according to the characteristics of the attribute, and representing the similarity contribution of the attribute to the equipment node;
the specific weight settings are shown in table 1 below:
TABLE 1
| Attributes of |
Characteristics of |
Weighting of |
| Device unique identification code sn |
Unique SN code as identity |
10 |
| Mac address |
Hiding real MAC using MAC address spoofing or MAC address cloning |
7 |
| Login credentials |
Possibly with software reinstallation, data scavenging, or user privacy settings |
6 |
| Device id |
Possibly with software reinstallation, data scavenging, or user privacy settings |
5 |
| Device ip |
Possibly with software reinstallation, data scavenging, or user privacy settings |
4 |
| Host name |
Possibly with software reinstallation, data scavenging, or user privacy settings |
3 |
| Geographic location |
Dynamic updates as devices migrate |
2 |
| Network location |
Dynamic updates as network systems are re-arranged, possibly |
1 |
In this embodiment, the device unique identifier sn weight (10), the device mac address weight (7), the login credential weight (6), the device id weight (5), the device ip weight (4), the hostname weight (3), the physical location weight (2), the network location weight (1),
Calculating the similarity, namely judging whether the similarity values of the two equipment nodes on each attribute are identical or not through CASE sentences, if so, carrying out subsequent calculation by using the current weight value of the attribute, otherwise, calculating by taking the weight value as 0 until all the attributes are judged;
And (III) calculating according to n1 and n2 equipment, and obtaining the following preliminary conclusion:
the devices sn are identical, so snSim =10 (i.e. weight 10);
the devices IDDEVICEID are identical, so DEVICEIDSIM =5 (i.e., weight 5);
Device ipip is different, so ipSim =0 (no match);
the devices macmac are identical, so macSim =7 (i.e., weight 7);
hostname is the same, so hostnameSim =3 (i.e., weight 3);
geographic location is the same, so locationSim =4 (i.e., weight 4);
network location networkLocation is different, so networkLocationSim =0 (no match);
login credentials credentials are the same, so CREDENTIALSSIM =6 (i.e., weight 6);
calculating a weighted sum, namely carrying out weighted summation on the weight values after all the attributes are judged to obtain the overall similarity of the two nodes, and carrying out weighted summation on the weight values before all the attributes are judged to obtain the total weight;
Overall similarity (overallSimilarity) =10+5+7+0+0+4+3+6=35
Total weight (overall) =10+7+6+5+4+3+2+1=38
Screening similar nodes, namely obtaining a similar rate according to the overall similarity and the total weight, wherein the formula is that the overall similarity/the total weight=the similar rate, and judging that two equipment nodes are similar nodes if the similar rate is larger than a threshold value;
similarity ratio (simVal) =35/38=92.1%
And merging the two similar nodes into one node, wherein in the merging process, if a plurality of nodes have different values on a certain attribute, a merging strategy is selected to merge the plurality of attribute values, so as to obtain an asset data merging result, and the merged node in the asset data merging result is returned as a result.
N3 (data fusion ):sn="sn_12345", deviceId="dev_001", ip="[192.168.1.1 , 192.168.1.2]", mac="00:14:22:01:23:45", hostname="device1", location=" machine room a "of device 1 and device 2, networkLocation =" [ network segment 1, network segment 2] ", credentials =" admin ";
For example, in an industrial control network, device A (SN: sn_001) and device B (SN: sn_002) are misjudged as two independent nodes due to IP change. And through calculation of fusion rules, the SN and the MAC addresses of the two devices are the same, the similarity ratio is (10+7)/17=100%, the merging operation is triggered, and finally, the unified node is generated.
And S400, carrying out map visual display on the asset data fusion result on a data canvas through a map canvas interface, and simultaneously providing a query interface to carry out data query analysis on the basis of map.
In some embodiments, the performing, through a graph canvas interface, the graph visualization of the asset data fusion result on a data canvas includes:
the front end initiates a request, namely when loading canvas on a Web front end interface, the front end sends a GET/api/graph request to the back end;
the back end processes the request, the back end inquires all the equipment nodes and the relation between the equipment nodes from the graph database;
the back end formats the query result of the graph database into a JSON format suitable for front end rendering, wherein the JSON format comprises node information and relation information;
The back end returns the nodes and the relation data to the front end, and the front end performs graphical display according to the data;
and the front end displays a chart, and the front end draws the graph according to the nodes and the relation data.
The execution codes of the visual display are as follows:
front end/: request to load graph data
fetch('/api/graph', { method: 'GET', })
.then(response => response.json())
Then (data= > {// drawing graphics from returned data
renderGraph(data.nodes, data.relationships);
}) .catch(error => console.error('Error fetching graph data:', error));
In some embodiments, the concurrently providing a query interface performs data query analysis on a mapped basis, including:
inputting a device unique identification code sn or a device ip of the device on the Web interface to inquire specific asset information;
The front end sends a GET/api/device/{ sn } or a GET/api/device/ip/{ ip } request to the back end;
The back end uses the unique identification code sn of the device or the device ip graph database to execute MATCH inquiry to obtain the detailed information of the device;
Acquiring detailed data of the equipment by inquiring the attribute of the equipment and the relation related to the equipment;
the back end returns the detailed information and relation of the equipment to the front end for display in a JSON format.
The execution code of the data query is as follows:
information of designated equipment is obtained by sending request
fetch('/api/device/sn_12345', { method: 'GET', })
.then(response => response.json())
Then (data= > {// show device information
showDeviceDetails(data.device);
}) .catch(error => console.error('Error fetching device data:', error));
In a second aspect, the application provides a system comprising a data acquisition preprocessing module, an asset data preliminary analysis module, an asset data fusion module and a mapping analysis module;
The data acquisition preprocessing module is used for acquiring and preprocessing asset data of all terminals in an industrial control internet environment;
The asset data preliminary analysis module is used for analyzing the asset node data and the access relation data from the preprocessed asset data according to the asset type, the attribution relation and the action record;
The asset data fusion module is used for importing the asset node data and the access relation data into a graph database, and formulating asset fusion rules to perform asset data fusion to obtain an asset data fusion result;
The atlas analysis module is used for carrying out atlas visual display on the data canvas on the asset data fusion result through an atlas canvas interface, and simultaneously providing a query interface for carrying out data query analysis on the basis of atlas.
In some embodiments, a data cleansing interface is also included for providing an interface for external active cleansing of obsolete data.
In a third aspect the application proposes an electronic device comprising a memory, a processor and a computer program stored in said memory and executable on said processor, said processor implementing the steps of the method as described above when said computer program is executed.
In a fourth aspect the application proposes a computer-readable storage medium storing a computer program which, when executed by a processor, implements the steps of a method as described above.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, the specific names of the functional units and modules are only for distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.
In the embodiments provided in the present disclosure, it should be understood that the disclosed apparatus/computer device and method may be implemented in other manners. For example, the apparatus/computer device embodiments described above are merely illustrative, e.g., the division of modules or elements is merely a logical functional division, and there may be additional divisions of actual implementations, multiple elements or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in each embodiment of the present disclosure may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated modules/units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present disclosure may implement all or part of the flow of the method of the above-described embodiments, or may be implemented by a computer program to instruct related hardware, and the computer program may be stored in a computer readable storage medium, where the computer program, when executed by a processor, may implement the steps of the method embodiments described above. The computer program may comprise computer program code, which may be in source code form, object code form, executable file or in some intermediate form, etc. The computer readable medium can include any entity or device capable of carrying computer program code, recording medium, USB flash disk, removable hard disk, magnetic disk, optical disk, computer Memory, read-Only Memory (ROM), random access Memory (Random Access Memory, RAM), electrical carrier signals, telecommunications signals, and software distribution media, among others. It should be noted that the content of the computer readable medium can be appropriately increased or decreased according to the requirements of the jurisdiction's jurisdiction and the patent practice, for example, in some jurisdictions, the computer readable medium does not include electrical carrier signals and telecommunication signals according to the jurisdiction and the patent practice.
The foregoing is merely a preferred embodiment of the present invention, and it should be noted that modifications and improvements made by those skilled in the art without departing from the present technical solution shall be considered as falling within the scope of the claims.