In the years that we have been working with customers to remove operational inefficiencies by applying operational AI, we have seen several recurring challenges from the use of conventional data science methods and realized that a different approach is needed.
Data retention policies vary widely but we see two common challenges.
Digital commerce and marketing platforms are designed to collect the kind of examples that support machine learning and fast decision making. They connect clicks with the possible factors that triggered them. Industrial operations usually lack the systems required to record ground truth (e.g. equipment failures and maintenance events) at this level of completeness.
Plants frequently lack a consistent data entry process and nomenclature. The same events may be called different things depending on who is entering the event and when it is being entered.
E.g. “Seal failure,” “door closure failure” or simply “maintenance” may all describe the same event.
A single failure may also be split into multiple entries making it very difficult to understand which events are distinct failure modes and which are actually a single failure.
E.g. An out of spec process result may be seen and first recorded as an out of cycle maintenance event because the tech on shift normally resolves all out of spec conditions this way. When early maintenance proves to be the wrong corrective action the same event shows up as a recalibration. When calibration does not resolve the problem, the same event shows up a 3rd time as a part replacement.
Operations experts are very busy keeping operations running. It is difficult to free up enough time in their schedules to provide diagnostic feedback or interpretations of problems which are not directly related to the fires they are putting out today. Because traditional analytics projects typically look at historical datasets, they are necessarily looking at problems that occurred in the past. When forced to choose between working on the problem of the day or working on a problem that was solved 7 months ago, the choice is generally, and unfortunately, easy to make.
We believe that a learn-as-you-go approach addresses these shortcomings. Figure 1 below illustrates the concept.
Fig 1 – Human guidance is used by the learning system to improve its observations and provide better predictions.
Fig 2 – An asset dashboard displays the status of various equipment or processes being monitored. Grey means no attention is needed. An orange alert means that a previously unseen condition has been detected. A red alert means a previously identified undesirable condition has been detected.
By using patterns to identify only interesting behaviors, the amount of time users must spend reviewing data is minimized. This approach avoids the limitations of mining historical data for a well documented set of interesting events. The approach also automatically adapts to changing plant conditions as it sees new patterns emerge.
Fig 3a – Raw time series data associated with the alert is displayed. In this case data shows a drop in the compressor suction temperature signal, a relatively sharp rise in the N2 filter DP signal and slow, gradual rise in the NDE primary seal leakage flow signal.
Fig 3b – Distribution of signal values per sensor (each a different chart) for a range of different conditions (each a different color line). In the example above, the dark blue condition shows a low value compared to other conditions in 4 of 6 sensors and a significantly shifted peak in 2 of 6 sensors.
By comparing underlying behaviors between different time periods and signals, the operations expert can better understand the root cause and importance of each alert. This knowledge can be used not only to label the event but to take corrective action.
At this point, the operations expert will classify the event as a new condition by providing it a name. Alternatively, they may classify it as an existing condition by selecting from among the names they have already provided in earlier feedback reviews. If the event is interesting and they would like to be alerted when this type of event occurs again, they can set the system to generate notifications. Otherwise, events of this type will not be reported in the dashboard.
Fig 4 – The software provides a simple dialog box which allows the operations expert to label each event as it is detected or defer labeling until more information becomes available.
This process of review and guidance provides the operations expert insight to the issues they are seeing today. They are able to more effectively address those issues using the feedback while simultaneously improving the learning system’s performance.
We believe that the learn-as-you-go approach makes widespread adoption of predictive production operations feasible because it addresses key shortcomings in deploying such systems today.