Is Logistic Regression Linear? The Full Answer

Logistic regression is a widely used statistical modeling technique. It allows researchers to predict the probability of a specific binary outcome, such as the presence or absence of a disease, or the success or failure of an experiment. A common point of confusion arises regarding its fundamental nature: Is logistic regression truly linear? This question delves into the mathematical underpinnings of the model, revealing a nuanced answer that combines both linear and non-linear characteristics.

What Logistic Regression Does

Logistic regression serves the purpose of predicting the likelihood that an observation belongs to one of two categories. Unlike linear regression, which forecasts a continuous numerical value, logistic regression is tailored for situations where the outcome is categorical. For instance, a biologist might use it to predict the probability of a plant surviving in a new environment based on factors like soil pH, light exposure, and water availability. The model outputs a probability value, typically ranging from 0 to 1, which represents the chance of a particular event occurring. This output is then often used to classify an observation into one of the two possible outcomes.

The “Linear” Component: How Inputs Are Combined

The “linear” aspect of logistic regression stems from how it combines the input variables. Before any non-linear transformation occurs, the model calculates a weighted sum of the independent variables. This process involves multiplying each input feature by a corresponding coefficient, or weight, and then adding these products together. An intercept term is also included in this sum, similar to the structure of a simple linear equation.

This intermediate result, often referred to as the “logit” or “log-odds,” represents a linear combination of the predictor variables. For example, if predicting the probability of disease, each risk factor (like age or exposure level) would have a specific weight, and their weighted sum would form this linear component. This step ensures that the influence of each input variable on the underlying score is additive and directly proportional to its assigned weight. Therefore, in terms of its parameters, the model is indeed linear, as these coefficients are estimated in a linear fashion.

The “Non-Linear” Transformation: The Sigmoid Curve

Following the linear combination of inputs, logistic regression employs a non-linear transformation through the sigmoid function. Also known as the logistic function, this mathematical operation takes the linear sum and squashes it into a probability value between 0 and 1. The sigmoid function produces an characteristic S-shaped curve, mapping any real-valued number input into this constrained output range. This transformation is essential because probabilities must naturally fall within 0 and 1, something a purely linear function cannot guarantee.

Small changes in the linear combination near the center of the curve lead to larger changes in probability, while changes at the extremes result in smaller shifts. This contrasts with a straight line, where a constant change in input always yields a constant change in output. The sigmoid function effectively converts the raw, unbounded linear score into a meaningful probability, making the model suitable for binary classification tasks.

Is Logistic Regression Linear? The Full Answer

Synthesizing these two distinct components, whether logistic regression is linear becomes clear. Logistic regression is considered “linear in its parameters” because the initial step involves a linear combination of the input variables and their coefficients. This means the relationship between the independent variables and the log-odds of the outcome is linear.

However, the model is “non-linear in its predictions” due to the application of the sigmoid function. This non-linear transformation converts the linear log-odds into a probability, which is the model’s ultimate output. This dual nature—linear in its internal structure (parameters) and non-linear in its output (probabilities)—is why logistic regression is often classified as a type of generalized linear model. This design allows it to produce interpretable insights into how predictors influence the likelihood of an event while ensuring the predictions are valid probabilities.