Disruption of operational technology (OT) in modern industrial enterprises and infrastructure can lead to serious consequences

For example, at a chemical plant, an OT failure can cause major financial losses due to a deterioration in product quality, damage to equipment, or production stoppages. The main threats to OT include: unintentional errors or malicious actions in operational control systems; the deterioration, malfunction, and physical disabling of equipment and machinery; hacker attacks on control systems.

Timely detection of faults in OT is a critical, highly complex task

It becomes even harder to detect anomalies in technological processes if they are caused by sabotage or a concealed hacker attack. Protection of technological processes against malfunction and outside interference is traditionally based on expert systems with sets of rules that determine when certain process indicators go outside a permissible range. The number of rules in such an expert system can be very large, especially given that an industrial facility may operate in different modes. It is difficult to keep so many rules up to date and monitor them in real time, so in practice the tolerance margins are often generous. This means that faults in technological processes are often detected in the late stages of their development.

Machine-learning technologies can facilitate the detection of faults in OT

A neural network trained on the historical operating data of the enterprise can monitor thousands of parameters in real time, and identify the tiniest deviation in a technological process. If there are changes to the technological process, a neural network can be quickly retrained, whereas restructuring the rules of an expert system within the same time frame is difficult and costly.

We have created, patented, and continue to develop Machine Learning for Anomaly Detection (MLAD)

An anomaly in this context means a significant deviation between the actual and expected value of a process indicator.

Our technology works with the telemetry of process control systems, and does not require additional sensors to be installed.

The telemetry of a technological process consists of tens of thousands of interconnected signals from control sensors and commands. The connections between the signals are set in the control logic of an ICS during its design, and are determined by the physical features of the technological process, operating conditions, input parameters, and other factors. There are many such connections at a large industrial facility. Even an experienced process engineer may not know about all of them. Changes in some signals inevitably produce changes in others. This feature of technological process telemetry is key to the success of our technology.

Essentially, our technology finds anomalies in data with the following characteristics:

  • data must be a multivariate time series
  • time series must contain values for multiple (10 to 10,000) parameters
  • measurement frequency must be in a range from 100 milliseconds to 24 hours
  • the values of the various parameters should be interconnected (by physical laws, control logic, process logic, etc.).
  • the parameters should include those of the most observational significance, and those that lie at (or as close as possible to) the root cause of the anomaly which is affecting various other parameters.

Our technology builds a machine-learning (ML) model of the technological object

The ML model is constructed on the basis of data about the technological process, and is trained on the historical operating data of the enterprise. After being trained, the ML model can predict the future values of technological parameters on the basis of their current values. Deviations from predicted values in the actual parameter values are summed up; the overall deviation of all technological parameters quickly adds up.

  • Predictions are based on the aggregate technological parameter values already received for a certain period – the input window.
  • On the basis of the input window, the neural network created using the ML model predicts what values the technological parameters must take during a certain time interval (prediction window) in the specified near future (prediction horizon).
  • Using the difference between the predicted values of the technological parameters and those actually observed, Kaspersky MLAD calculates prediction errors for each parameter.
  • Based on the aggregate prediction errors, Anomaly Detector calculates the mean square error (MSE). Each technological parameter is assigned a weight that is used in calculating the error. For example, for an ICS at a chemical plant, the readings of the pressure sensor inside the reactor are more important than the readings of the atmospheric pressure sensor on the plant floor, so a deviation in the former will have a greater impact on the overall error.
  • An anomaly is recorded when the MSE exceeds a certain threshold, which is predefined during creation of the ML model.

History

In June 2019, we released Kaspersky MLAD, the product based on this technology. We are currently developing the technology in order to apply it in predictive analytics.

Learn more