Detect anomalies in manufacturing data using Amazon SageMaker Canvas

With the use of cloud computing, big data and machine learning (ML) tools like Amazon Athena or Amazon SageMaker have become available and useable by anyone without much effort in creation and maintenance. Industrial companies increasingly look at data analytics and data-driven decision-making to increase resource efficiency across their entire portfolio, from operations to performing predictive maintenance or planning. Due to the velocity of change in IT, customers in traditional industries are facing a dilemma of skillset. On the one hand, analysts and domain experts have a very deep knowledge of the data in question and its interpretation, yet often lack the exposure to data science tooling and high-level programming languages such as Python. On the other hand, data science experts often lack the experience to interpret the machine data content and filter it for what is relevant. This dilemma hampers the creation of efficient models that use data to generate business-relevant insights. Amazon SageMaker Canvas addresses this dilemma by providing domain experts a no-code interface to create powerful analytics and ML models, such as forecasts, classification, or regression models. It also allows you to deploy and share these models with ML and MLOps specialists after creation. In this post, we show you how to use SageMaker Canvas to curate and select the right features in your data, and then train a prediction model for anomaly detection, using the no-code functionality of SageMaker Canvas for model tuning. Anomaly detection for the manufacturing industry At the time of writing, SageMaker Canvas focuses on typical business use cases, such as forecasting, regression, and classification. For this post, we demonstrate how these capabilities can also help detect complex abnormal data points. This use case is relevant, for instance, to pinpoint malfunctions or unusual operations of industrial machines. Anomaly detection is important in the industry domain, because machines (from trains to turbines) are normally very reliable, with times between failures spanning years. Most data from these machines, such as temperature senor readings or status messages, describes the normal operation and has limited value for decision-making. Engineers look for abnormal data when investigating root causes for a fault or as warning indicators for future faults, and performance managers examine abnormal data to identify potential improvements. Therefore, the typical first step in moving towards data-driven decision-making relies on finding that relevant (abnormal) data. In this post, we use SageMaker Canvas to curate and select the right features in data, and then train a prediction model for anomaly detection, using SageMaker Canvas no-code functionality for model tuning. Then we deploy the model as a SageMaker endpoint. Solution overview For our anomaly detection use case, we train a prediction model to predict a characteristic feature for the normal operation of a machine, such as the motor temperature indicated in a car, from influencing features, such as the speed and recent torque applied in the car. For anomaly detection on a new sample of measurements, we compare the model predictions for the characteristic feature with the observations provided. For the example of the car motor, a domain expert obtains measurements of the normal motor temperature, recent motor torque, ambient temperature, and other potential influencing factors. These allow you to train a model to predict the temperature from the other features. Then we can use the model to predict the motor temperature on a regular basis. When the predicted temperature for that data is similar to the observed temperature in that data, the motor is working normally; a discrepancy will point to an anomaly, such as the cooling system failing or a defect in the motor. The following diagram illustrates the solution architecture. The solution consists of four key steps: The domain expert creates the initial model, including data analysis and feature curation using SageMaker Canvas. The domain expert shares the model via the Amazon SageMaker Model Registry or deploys it directly as a real-time endpoint. An MLOps expert creates the inference infrastructure and code translating the model output from a prediction into an anomaly indicator. This code typically runs inside an AWS Lambda function. When an application requires an anomaly detection, it calls the Lambda function, which uses the model for inference and provides the response (whether or not it’s an anomaly). Prerequisites To follow along with this post, you must meet the following prerequisites: Create the model using SageMaker The model creation process follows the standard steps to create a regression model in SageMaker Canvas. For more information, refer to Getting started with using Amazon SageMaker Canvas. First, the domain expert loads relevant data into SageMaker Canvas, such as a time series of measurements. For this post, we use a CSV file containing the (synthetically generated) measurements of an electrical motor. For details, refer to Import data into Canvas. The sample data used is available for download as a CSV. Curate the data with SageMaker Canvas After the data is loaded, the domain expert can use SageMaker Canvas to curate the data used in the final model. For this, the expert selects those columns that contain characteristic measurements for the problem in question. More precisely, the expert selects columns that are related to each other, for instance, by a physical relationship such as a pressure-temperature curve, and where a change in that relationship is a relevant anomaly for their use case. The anomaly detection model will learn the normal relationship between the selected columns and indicate when data doesn’t conform to it, such as an abnormally high motor temperature given the current load on the motor. In practice, the domain expert needs to select a set of suitable input columns and a target column. The inputs are typically the collection of quantities (numeric or categorical) that determine a machine’s behavior, from demand settings, to load, speed, or ambient temperature. The output is typically a numeric quantity that indicates the performance of the machine’s operation, such as a temperature measuring energy dissipation or another performance metric changing when the machine runs under suboptimal conditions. To illustrate the concept of what quantities to select for input and output, let’s consider a few examples: For rotating equipment, such as the model we build in this post, typical inputs are the rotation speed, torque (current and history), and ambient temperature, and the targets are the resulting bearing or motor temperatures indicating good operational conditions of the rotations For a wind turbine, typical inputs are the current and recent history of wind speed and rotor blade settings, and the target quantity is the produced power or rotational speed For a chemical process, typical inputs are the percentage of different ingredients and the ambient temperature, and targets are the heat produced or the viscosity of the end product For moving equipment such as sliding doors, typical inputs are the power input to the motors, and the target value is the speed or completion time for the movement For an HVAC system, typical inputs are the achieved temperature difference and load settings, and the target quantity is the energy consumption measured Ultimately, the right inputs and targets for a given equipment will depend on the use case and anomalous behavior to detect, and are best known to a domain expert who is familiar with the intricacies of the specific dataset. In most cases, selecting suitable input and target quantities means selecting the right columns only and marking the target column (for this example, bearing_temperature). However, a domain expert can also use the no-code features of SageMaker Canvas to transform columns and refine or aggregate the data. For instance, you can extract or filter specific dates or timestamps from the data that are not relevant. SageMaker Canvas supports this process, showing statistics on the quantities selected, allowing you to understand if a quantity has outliers and spread that may affect the results of the model. Train, tune, and evaluate the model After the domain expert has selected suitable columns in the dataset, they can train the model to learn the relationship between the inputs and outputs. More precisely, the model will learn to predict the target value selected from the inputs. Normally, you can use the SageMaker Canvas Model Preview option. This provide a quick indication of the model quality to expect, and allows you to investigate the effect that different inputs have on the output metric. For instance, in the following screenshot, the model is most affected by the motor_speed and ambient_temperature metrics when predicting…

Leave a Reply

Your email address will not be published. Required fields are marked *

National Random Acts of Kindness Day 2024

National Random Acts of Kindness Day 2024

Some people might think that National Random Acts of Kindness Day was started to

Charting The GBS Course For 2024: Accelerating And Sustaining Success | Webinar

Charting The GBS Course For 2024: Accelerating And Sustaining Success | Webinar

GBS (Global Business Services) organizations have established themselves as

You May Also Like