What Are Surrogate Models and How Are They Used?

Surrogate models are simplified approximations that estimate the behavior of complex systems. They act as a faster, more efficient stand-in for detailed models, capturing the system’s key features and responses. This approach allows for manageable analysis of intricate problems that would otherwise be too demanding to study directly. The general purpose of a surrogate model is to make complex analyses more practical, enabling quicker predictions and informed decisions.

The Need for Surrogate Models

Directly using complex, high-fidelity models often presents significant challenges, making surrogate models a necessary tool. One primary issue is the excessive computational time required for a single simulation, which can range from minutes to hours or even days. For instance, optimizing an aircraft wing’s airfoil shape involves simulating airflow for numerous variables like length, curvature, and material, with each simulation potentially taking considerable time. These extended computation times make routine tasks such as design optimization, exploring design possibilities, or analyzing sensitivities impractical, as they might require thousands or even millions of evaluations.

High computational costs further hinder direct analysis, especially when extensive data or specialized hardware is needed. The inability to perform rapid iterations also slows down design and analysis processes, limiting the exploration of various scenarios. Surrogate models offer a solution by providing a computationally efficient alternative that mimics the behavior of the original simulation model, allowing for more efficient exploration and optimization. They address these limitations, making analyses feasible that would otherwise be too resource-intensive.

Building and Using Surrogate Models

Creating and utilizing a surrogate model begins with generating training data from the complex, high-fidelity model. This involves running the complex simulation at a limited number of intelligently selected input points, recording both inputs and their corresponding outputs. These input-output pairs form the dataset used to teach the surrogate model the system’s underlying relationships. Techniques like Latin hypercube sampling are often employed to efficiently cover the input parameter space without requiring excessive data samples.

Once training data is collected, the next step involves “training” the surrogate model. This means building a statistical model that learns to approximate the output of the complex simulation based on the provided input-output pairs. The goal is to capture the original model’s behavior as closely as possible, while being faster to evaluate. This training process is a form of supervised machine learning, where the surrogate model learns a mapping between design parameters and performance criteria.

After training, the surrogate model can be used for predictions or simulations in place of the original complex model. It allows for rapid estimation of outcomes for new input parameters, avoiding expensive simulations. The accuracy of the surrogate model depends on the quantity and strategic placement of the initial training samples. If initial accuracy is not sufficient, the process can be refined by adding more data points and retraining the model until desired accuracy is achieved.

Real World Applications

Surrogate models find diverse applications. In engineering design, they are widely used to optimize product performance, such as improving car aerodynamics or designing aircraft wings. For example, in the aerospace industry, surrogate models can predict distributions of stresses or temperature fields across different designs, aiding optimization problems. This allows engineers to iterate on designs, leading to efficient product development cycles.

In climate modeling, surrogate models enable predictions of environmental impacts. They can be used to create data-driven models of geofluids and to correct errors in physical models, allowing for forecasts of phenomena like geopotential fields. While the application for long-term climate modeling is still developing, these models help in understanding complex atmospheric and oceanic processes.

Drug discovery also benefits from surrogate modeling, particularly in the early stages of development. Physiologically based pharmacokinetic (PBPK) models, which can act as surrogates, help in early risk assessment, predicting human doses, and screening potential compounds. These models aid in decision-making by providing insights into how drugs are absorbed, distributed, metabolized, and excreted.

Financial modeling uses surrogate models for tasks like risk assessment and financial forecasting. For instance, Support Vector Machine (SVM) models, a type of surrogate, are effective in handling high-dimensional data for predicting continuous values in areas like energy load prediction. This allows for evaluations of complex financial scenarios and helps in identifying potential outliers or anomalies, such as in fraud detection.

Varieties of Surrogate Models

There are several categories of surrogate models, each with distinct approaches to approximating complex systems. Statistical methods represent one broad category, which includes techniques like Response Surface Methodology (RSM) and polynomial regression. RSM, for instance, often uses polynomial equations to fit input-output relationships, capturing curvature in data for process optimization in systems with moderately nonlinear responses. These methods focus on empirically modeling the relationship between inputs and outputs.

Machine learning-based approaches form another significant group, leveraging algorithms to learn complex relationships from data. Examples include Radial Basis Functions (RBF), Support Vector Machines (SVM), Gaussian Processes (also known as Kriging), and Artificial Neural Networks. Gaussian Processes are probabilistic models that can handle complex systems and provide uncertainty estimates alongside predictions, while neural networks are capable of capturing highly nonlinear relationships. These models are trained on collected data to predict outcomes.

A developing area involves physics-informed models, which integrate knowledge of underlying physical laws into the model’s learning process. Unlike purely data-driven “black-box” models that rely solely on input-output pairs, physics-informed models incorporate governing physical equations to enhance accuracy and reliability. This approach is particularly useful when data is limited, as the physical constraints guide the model’s learning, leading to more robust predictions, especially for extrapolation.