Least squares linear regression in Excel is easy. That’s what the Linest and Trend functions do. That is, they find the coefficients of a straight line (or higher dimension shape) so that the sum of the squares of the distances of each data point from the line is a minimum. However, if we want to use weighted data (give the values at some points more importance than others), there are no built in functions to do the job. This post looks at various options, including using Linest with modified input, VBA user defined functions (UDFS) and UDFs using the Alglib and Python Scipy libraries. All the code used is free and open source, and may be downloaded from:
WeightLSq.xlsb (VBA and Linest versions)
xl_Spline-Matrix2.zip (Alglib version)
xlScipy-xlw.zip (Python Scipy version)
The first spreadsheet uses VBA only. The other two both use Python and require Python and xlwings to be installed. The Alglib and Python downloads also include wide variety of other functions.
The screen shot below shows some sample data with two known variables (X1 and X2) and an associated value Y. We wish to find a linear equation of the form:
Y = A + B.X1 + C.X2
that fits the given Y values as closely as possible.
The first two results use the Linest function with no weighting. In the first formula, the optional Const value is omitted, so Const is set to True, and the equation constant value (A) is calculated. For this case the X range only requires the X1 and X2 values (columns B and C). In the second formula Const is entered as False, so an additional column is required with X = 1.
Note that Linest returns the coefficients in reverse order, so our equation is given by:
Y = 22.12 + 0.0137X1 – 1.032X2
Linest can be used with weighted data by applying the weights to both the X and Y data, but to return the correct results the following points are important:
- The weighted error values for each point are squared, so the weights are also squared, but the standard definition for weighted least squares applies the weight to the squared errors (see weighted linear least squares). The Linest function should therefore be passed the square root of the weights.
- The weights must also be applied to the intercept data (the column of ones), so the Linest Const value must be set to False, and the intercept column included in the X data, as for the second unweighted example.
The table of weight square roots may either be generated on the spreadsheet (Weighted Linest 1 above), or the square root can be applied within the Linest formula (Weighted Linest 2).
Results of VBA functions performing the least squares calculations (unweighted and weighted) are shown below:
Full open source code is included in the download file. Note that the results are identical to those found by Linest, but returned in the reverse order. For the weighted analysis the column of full weights is used as the input data.
The two screenshots below show an on-sheet calculation, using the same method as used in the VBA functions. These are also included in the download file.
The Python Scipy library includes a least squares function, which is included in the xlw-SciPy spreadsheet. A weighted version has now been added:
The Alglib library also has a least squares function, including both unweighted and weighted versions:
In the Alglib weighted function the weights are squared, as well as the error values (see Note 4 here). The square root of the weights should therefore be used for consistency with the other functions.