How good would it be if your company could foresee any equipment breakdown in advance and react properly? Predictive maintenance (PdM) is a great proactive maintenance strategy that allows business leaders to detect a potential maintenance challenge and solve it before it actually occurs. This way, you perform maintenance at your own production schedules, avoid unexpected downtimes, and increase the lifespan of your machinery.

Predictive maintenance using machine learning (ML) systems are both effective and reliable. Based on the historical data inputs, this solution is always “learning” and evolving, knowing about the tiniest changes in the “normal” behavior of your equipment. In the article below, we’re telling you about traditional ML techniques used to solve a maintenance problem.

Supervised vs unsupervised learning in predictive maintenance

Based on the data collected, data scientists can address the maintenance problem using one of the two techniques:

Supervised learning if labeled failure events are present in the company’s dataset
Unsupervised learning if no labeled failure events are available in the dataset

Of course, this wholly depends on the company’s maintenance policy — some businesses may not be used to collecting any maintenance data at all. This makes it impossible for them to implement a supervised-based PdM solution in the future. Nonetheless, if the company has collected at least some raw data from the equipment sensors, it’d benefit the company to build a powerful PdM solution if using this data by taking a mixed approach of supervised and unsupervised learning.

Supervised Learning based Predictive Maintenance

The quality of data matters the most in big data analysis and building a top-performing and robust PdM solution. So, if the company has enough maintenance information and, what’s important, quality data, going with supervised machine learning is a good starting point. Here we should also remember the division of supervised ML problems into regression (the task of predicting a continuous quantity) and classification (the task of predicting a discrete class label) problems.

But what data exactly would the company need to get started with a supervised learning-based predictive maintenance system?

The complete fault history, which should range from the normal equipment operation to its state during failures. The ML model should be able to follow the whole path from the normal operating state to the machine breakdown and train on both types of data to be able to make efficient predictions in the future.
The detailed history of maintenance and repairs, which will provide enough maintenance data for training the PdM model. This could include the information about replaced components as well as when and how the equipment or its components were fixed.
Machine conditions, such as the information about the aging patterns and anomalies that have led to reduced performance. We understand that every piece of equipment has a limited machine lifetime. Still, we can extend its uptime if monitoring the health status of the equipment and taking proactive measures before the equipment failure actually happens.

Unsupervised Learning based Predictive Maintenance

Even if the company doesn’t have any critical maintenance information reflected in its historical data, talented data engineers can still build a PdM solution using unsupervised ML techniques used for anomaly detection of equipment behavior. As said, the main difference here is that unsupervised learning-based solutions could use unlabelled or raw data in contrast to the dependency of supervised learning on labeled data for training.

Both traditional ML techniques and deep learning algorithms are used to address a predictive maintenance problem, depending on the complexity of the ML task. Below we are talking about traditional ML approaches, which are good to start with when planning to implement a PdM solution.

Traditional Machine Learning techniques to build a Predictive Maintenance solution

Decision trees

This is a supervised learning method frequently used for classification problems. The structure of this algorithm resembles a tree, which actually explains its name. Precisely, each internal node marks a test on an attribute; a branch is associated with the result of the test; and a leaf note (a terminal note) stands for a class label.

To build a decision tree, a data engineer would need to divide a source set into subsets, rooting from the attribute value test. The same action gets repeated for each derived subset in a recursive manner. This is the process also known as recursive partitioning. The data engineer considers the recursion as complete when the subset at a node equals the value of the target available or in case the splitting doesn’t benefit the forecasts anymore.

Use of decision trees in PdM

There are lots of use cases of how this algorithm could be used in predictive maintenance. We consider one of them, related to determining the remaining useful life (RUL) of Lithium-ion batteries.

An important thing about these batteries is their use in specific conditions and the need for a battery management system (BMS) to monitor the battery state and, this way, ensure its safety. Many ML methods were applied to solve the RUL challenge, though they faced the next limitations:

The info hidden in the historical degradation status wasn’t reflected in the extracted features
Lack of precision or low accuracy of RUL prediction caused by nonlinearity

What actually worked as a solution was the combination of the time window (TW) and Gradient Boosting Decision Trees (GBDT). In this scenario,

The energy and fluctuation index of voltage signals were being verified and chosen as features
Then features were extracted from the historical discharge process with the use of a TW-based approach
Finally, GBDT was adopted for modeling the relation of features and the RUL of Lithium-ion batteries

Pros and cons of decision trees

Pros	Cons
Easy data preparation during pre-processing	Lack of stability — the smallest change in data results in major changes in the decision tree structure
No need for data normalization and data scaling	Needs truly complex calculations in some cases
Missing values do not create any obstacle to using the algorithm	Expensive and time-consuming in training

Support Vector Machines (SVM)

This algorithm is widely used to address both classification and regression problems. The idea behind SVM is to create a line or a hyperplane in N-dimensional space (where N stands for the number of features) that distinctly classifies the data points and separates them into two classes.

A lot of possible hyperplanes can be chosen among the two classes of data points. Data engineers are looking for a hyperplane with a maximum margin, i.e. the maximum distance between data points of both classes. This allows us to classify the data points with more confidence in the future.

Use of SVM in PdM

Let’s consider the case of fault detection and diagnosis (FDD) of chillers as an example of how SVM is applied in PdM. As highly energy-consuming equipment, chillers provide cooling in buildings and need to be optimized in their usage.

The Least Squares Support Vector Machine (LS-SVM) model was created and optimized by cross-validation to leverage FDD on a 90-ton centrifugal chiller. This was achieved in three steps:

The analysis of three system-level and four component-level faults
Validation and employment of eight fault-indicative features extracted from the original 64 parameters
Choice of the LS-SVM model based on its better results in overall diagnostics, detection rate, and false alarm rate as compared to other ML methods used

The data engineers that worked on the project were impressed with the prediction precision:

99.59% for refrigerant leak/undercharge
99.26% for refrigerant overcharge
99.38% for excessive oil

Pros and cons of SVM

Pros	Cons
Suits best for unstructured and semi-structured data	No probabilistic explanation for classification
Low risk of overfitting	An absent standard for choosing the kernel function
Good to use when there is a clear margin of separation between classes	Works bad with massive datasets
More effective in high-dimensional spaces	Not suitable when there is much noise in data or target classes are overlapping

K-Nearest Neighbors algorithm (KNN)

This is one more supervised ML algorithm that suits well for both classification and regression problems. The idea of this algorithm lies in similarity (proximity), meaning that similar data points stay close to each other. The algorithm checks the distance between a query and the examples in the data and then chooses a certain number of examples (K) that are the closest to the query. Then, if this is a classification problem, the algorithm votes for the most frequent label. In the case of a regression problem, the averages of labels get calculated.

Once the new data appears, it’s assigned to one of the categories based on the majority votes of its neighbors. It goes to the class most common among the K nearest neighbors, measured by a distance function.

Use of KNN in PdM

The case study on the diagnosis of electric traction motors exemplifies a wide application of KNN in predictive maintenance. Multiple operational conditions, such as variable load or rotational speed, characterize how this type of motor works. The diversity of these factors complicates diagnosing the bearing defects, including detecting the onset of degradation, isolating the degrading bearing, and classifying defect types.

This classification problem was yet addressed by building a diagnostic system based on a hierarchical structure of the KNN classifiers. Data scientists used previously measured vibration signals as input, while the development of the bearing diagnostic system combined the use of Multi-Objective (MO) optimization and the integration of Binary Differential Evolution (BDE) with KNN. Although this approach was used with the experimental datasets, the results were promising enough to use in a real-life environment.

Pros and cons of KNN

Pros	Cons
Zero time for training — the algorithm has storage of training datasets and learns only from making real-time predictions	Not the best option for large datasets and multiple dimensions, as well as increased sensitivity to unbalanced datasets, missing values, outliers, and noisy data
Opportunity to add data easily, and this won’t affect the overall accuracy	“K” in the algorithm needs to be determined in advance
Easy implementation	Needs feature scaling

Wrap up

In the article, we discussed the three most popular machine learning algorithms that are used to solve a predictive maintenance problem across different industries. For sure, there is a no fit-it-all algorithm that could fit any solution regardless of the situation. Instead, data engineers should choose the algorithm very carefully and step by step to achieve effective results in the future.

In case you’re wondering how to get started with predictive maintenance and how to build an ML expertise in your organization, we recommend you to read more in a 21-page white paper on predictive maintenance. We hope you find this reading insightful, and it will allow your company to reduce downtime and optimize business operations.

Author

AIJ Guest Post

View all posts

AIJ Guest Post 15 April 2022

2 7 minutes read

2 Comments

Alex says:

18 April 2022 at 7:29 AM

It would be nice to see more detailed about the integration of Binary Differential Evolution (BDE) with KNN
1. Alexander Barinov says:
  
  10 May 2022 at 11:50 AM
  
  Thanks for the comment, Alex, I’ll try to cover this topic in further articles. Meanwhile you might read other articles on solving predictive maintenance challenge with ML in this series: https://www.ai.intelliarts.com/post/predictive-maintenance-in-manufacturing

Using a Traditional Machine Learning approach for Predictive Maintenance

By Alexander Barinov, Managing Partner at Intelliarts

Supervised vs unsupervised learning in predictive maintenance

Supervised Learning based Predictive Maintenance

Unsupervised Learning based Predictive Maintenance