Using Linest for non-linear curve fitting, examples, hints and warnings

Since I wrote Using LINEST for non-linear curve fitting in 2011 it has been by far the most popular post on this blog.  This post (in response to a recent question) provides some more detailed guidance on how to apply the function and use the results.  It also provides some examples of using the related TREND function, and provides some general hints and warnings about using curve-fitting on a limited data set, particularly for the purposes of extrapolation.

The spreadsheet used in this post, and the original post, can be downloaded from Linest-poly-example.xls and Linest-poly.xls.

The data used in the examples is three points relating to the value of a futures contract.  Please note that I do not know the details of how futures are priced.  I have treated this question as simply one of fitting different functions to a given data set; in this case just three points, but the same procedures apply to any size of data.

The simplest approach is to treat the data as linear, and use the Linest function as documented in the Excel help.  As in all the examples, the data consists of the columns of X data (A5:A7) and Y data (B5:B7).  These two ranges have been named X_1 and Y_1, and the range names are used in all the examples:

Linest5-1

The Linest function returns the slope and Y intercept of the straight line that most closely fits the data (values a and b above).  Note that the function is entered in a single cell, and must be entered as an array function to display both results:

  • Enter the function as usual
  • Select the output range (A28:B28 in the example above)
  • Press F2, then Ctrl-Shift-Enter

As can be seen above, the Linest function returns exactly the same results as using the trend line display on an XY graph.

To calculate the Y value for any given X enter the formula:
= a*X + b
where a, X, and b are the cell addresses containing the appropriate values.

The Trend function combines both steps in a single function, and returns exactly the same end result:
= Trend(Y_1, X_1, x)

The Linest and Trend functions will also work with non-linear data:

Linest5-2

In this example the Y values in the Linest function have been replaced with their natural logarithm (using the LN function).   The resulting “a” coefficient is exactly equal to the power coefficient returned by the chart trend line results for an exponential curve.  To find the initial factor (1288.886) the “b” coefficient is entered in the Exp function.  For interpolation the “a” and “b2″ factors are entered in:
  = b2 * Exp(a * x)
as shown above.  Alternatively (and more conveniently) the “b1″ coefficient may be used directly in:
= Exp(a * x + b1)

The second formulation may conveniently be used in the Trend function:
= Exp(Trend(Ln(Y_1), X_1, x))
which gives exactly the same result in a single step.  In this case the exponential fit has given almost exactly the same results as the linear fit, but in other cases there will be a much greater difference, as will be seen below.

The workings of the exponential fit are shown more clearly in the example below, where the Ln values have been calculated on the worksheet, and plotted with a linear trend line:

Linest5-2a

Plotting Ln(Y_1) against X_1 it can be seen that the result is not an exact straight line, indicating that the data does not fit an exact exponential curve.

An alternative trend line for data with steadily increasing curvature is a quadratic curve:

Linest5-3

A quadratic curve is of the form:
Y = a * x^2 + b * x +c

To generate the three coefficients using Linest enter:
=Linest(Y_1, X_1^{1, 2})

This returns three coefficients, a, b, c, which are exactly equal to the values generated by the chart quadratic curve fit.  These values can be entered in the quadratic formula given above.  Alternatively the Sumproduct function can be used:
=SumProduct((A93:C93), E93^{2,1,0})

The Trend function can again be used to conveniently generate the same result in a single step:
=Trend(Y_1, X_1^{1,2}, x^{1,2})

Note that using the trend function the x value must also be converted to an array consisting of x and x^2, using the notation x^{1,2}, as for the X_1 input data.

In some cases a cubic curve may provide a better fit than a quadratic.  The curve coefficients may also be generated using LinEst:

Linest5-4

The function format in this case is:
 =Linest(Y_1, X_1^{1, 2,3})
which returns four coefficients, a, b, c, d, as shown above.

As before, these results can be used for interpolation using the cubic equation:
= a * x^3 + b * x^2 + c * x + d

The trend function can also be used with a similar format to the quadratic curve:
=Trend(Y_1, X_1^{1,2,3}, x^{1,2,3})

Comparing the Linest results with a chart cubic trend line we can now see that the coefficients returned are entirely different:

  •  The chart trend line has returned a quadratic function (with exactly the same coefficients as when quadratic was the chosen function).
  • The LinEst function has returned a non-zero coefficient for the x^3 term, and zero for the x term.

For the interpolated value at 30 days the four different curves have all given different results, but the differences are small.  When the same curves are used for extrapolation (i.e. finding Y for an x value outside the range of the original data) the results are very different:

Linest5-5

The linear and exponential results are still fairly close (but they would diverge rapidly for longer time periods), but the quadratic result is now about 30% higher than the linear estimate, whereas the cubic reached a peak and has then gone negative.  Note that not only has the cubic curve generated with the LinEst (or Trend) function given a result widely different from the quadratic curve, is equally distant from the result of using a chart cubic trend line!

This may look like a bug, but it is in fact a result of “over-fitting”.  This can be illustrated using a third Excel option for curve fitting, the data analysis tools.  In recent versions of Excel the Data Analysis tools are found at the right hand end of the Data Ribbon.  In Earlier versions they are included in an analysis tool-pack, which needs to first be installed.

For the purposes of using the Regression Tools for fitting a polynomial curve (i.e. a quadratic, cubic, or curves with higher powers of x) the x data for each required power must be generated on the spreadsheet as shown below:

Linest5-6

To carry out a regression analysis, having generated the required data, click the Data Analysis icon, then scroll down and select the Regression function, which will open a dialog box.  If the chosen X range consists of the original three rows and three columns (i.e. x, x^2 and x^3) the message shown above will be returned.

The problem is that a cubic curve, being defined by four coefficients, needs four points to define a single curve.  There are an infinite number of different cubic curves that will pass through any three points, including curves with a coefficient of zero for the x^3 term (as returned by the chart cubic trend line) and with a coefficient of zero for the x term (as returned by the LinEst function).

To get the regression tools to return results we need to define a fourth point, but if the fourth point is a duplicate of any one of the original three we get no complaint:

Linest5-7

The results now are exactly as generated by LinEst (click on any image for a full size view):

Linest5-8

The additional detail provided by the regression tool shows that the generated line is an exact fit through each data point.

If we now change the fourth point to the value of Y generated by the quadratic fit for X = 30:

Linest5-9

the returned coefficients are now exactly (to machine precision) the same as those returned by the LinEst function for a quadratic fit (and also the chart trend line for both quadratic and cubic trend lines).  The error at each point is again zero, showing that the LinEst quadratic coefficients were also an exact fit for the original three points.

In Summary:

  • There are many curves that will pass exactly through any three defined points, including a single quadratic or circular curve, or an infinite number of different cubic (or higher polynomial) curves or ellipses.
  • Which is the “best fit” depends entirely on the nature of the base data.
  • To choose the best fit requires either more data or an understanding of the process generating the numbers.
  • In the case of futures pricing the exponential curve would probably be the best fit, but the fact that the generated curve was not an exact fit to the supplied data suggests that the analysis needs to be done by someone with a good understanding of how futures pricing works.
  • Using Linest to fit a higher order polynomial to a small data set gives results that are mathematically correct, but may be misleading.
  • Using the Trend function is convenient, but gives no hint of the hidden complications, so use with caution.
Posted in Curve fitting, Excel | Tagged , , , , , , | 1 Comment

Sheila Chandra; This

Sentence

Is

True

From:

“This Sentence Is True” (The Previous Sentence Is False)

Posted in Bach | Tagged , , | Leave a comment

Conbeam – Python version

I have converted the continuous beam analysis spreadsheet (last presented here) to Python code, linked to Excel VBA User Defined Functions (UDFs) with ExcelPython.  To run the Python version you will need an installed copy of Python, including Numpy and Scipy.  Everything else is included in the download file, including all the necessary ExcelPython files.

The spreadsheet (including full open source code) can be downloaded from Conbeampy.zip.  To run the spreadsheet just unzip everything to any directory.

The file ConBeampy.xlsb includes all the new functions with one example of each.

ConBeampy-check.xlsb also includes checks of the Conbeam function results against results from Strand7, for 15 different configurations of beams with different span arrangements and support conditions.  The screenshot below shows there was close agreement for all 15 cases.

Conbeampy1-1

Click for full size view

More examples of using each of the included functions will be provided in future posts.

Posted in Beam Bending, Excel, Finite Element Analysis, Frame Analysis, Link to Python, Newton, UDFs, VBA | Tagged , , , , , , , | Leave a comment

Dealing with dates 3: opening and saving csv files without data corruption

This post makes use of a spreadsheet for reading data from text files, including comma separated value (csv) files.  The latest version includes a routine for saving the imported data in csv format.  It can be downloaded (including full open-source code) from Text-in2.zip.

Excel allows csv files to be opened and saved directly from the File menu, without going through the File-Import process.  Unfortunately the data is treated in the same way as text entered into a spreadsheet cell formatted as General, so any text that looks like a number will be stored as a numerical value, and displayed in the format that Excel deems appropriate.  If the file is saved with the original name the data will be changed, which in some cases result in dates being changed, or numeric codes being converted into dates.

This is illustrated with the short csv file below, consisting of four columns:
– Numbers
– Text that looks like numbers
– Dates
– Text that looks like dates

Dates-3-1

The screenshot below shows the results of opening this file in Excel with default language set to Australian English, then saving it as csv, without making any changes.

Dates-3-2

It can be seen that changes have occurred in all the columns:

  • Fractions are treated as dates; 3/4 becomes 3-Apr
  • Leading zeros in numerical text strings are deleted
  • Numerical text with a single E is treated as scientific notation.
  • “” around a text string is deleted and ignored
  • Any separator in a text string is converted to /.
  • Four digit years are sometimes converted to two, and vice versa
  • All the entries under “Look like dates” have been converted into dates, except for those with numbers separated by spaces.

Changing the default language to US English results in significantly different results for the dates, and strings interpreted as dates:

Dates-3-3

  • The fractions (3/4) are converted to 4-Mar, rather than 3-Apr
  • Date strings with two numbers are treated as mm/dd, rather than dd/mm, except when the first number is greater than 12.
  • Numeric date strings with three numbers display with the day and month numbers unchanged, but are treated in Excel as being mm/dd, rather than dd/mm.
  • Unambiguous date strings, are left with the day/month order unchanged.

CSV files can now be opened and saved without any unintended changes using the Spreadsheet Text-in2 (download link at the top of this post), which has three new features:

  • Optional formatting of each column
  • Text2Date() function to convert any date string into an Excel date value, optionally with a specified day/month/year sequence.
  • Save imported data to a csv file.

If each column of a csv file is imported as text (format code = @), the data is all imported with no changes:

Dates-3-4

For simple applications numbers imported as text will be treated as numbers by Excel, but for many purposes they must be converted to numbers for everything to work as expected; for instance the Sum function treats all text as having value zer0.  Any column containing numeric data can be imported as numbers by entering the appropriate format code; for example enter 0.00 to import as numbers and display to 2 decimal points.  See the Text-in2 NumFormats sheet for more examples of number format codes.

The dates are imported as text, but can be converted to Excel date values using the Text2Date function.

When the imported data is saved to a csv file all text remains exactly as in the original file:

Dates-3-5

Text2Date optionally allows the date order to be specified, with the System local default being used when not specified.  Date order options are:

0: mm/dd/yyyy  (default for US English)
1: dd/mm/yyyy (default for non-US English)
2: yyyy/mm/dd

Dates-3-6

The function also has an optional Separator argument, which should be specified for any separator other than /.

The screenshot below illustrates use of Text2Date with DateOrder set to zero.  Note that in this case the function returns an error message (“Not a valid date string”) if the first number is greater than twelve, or the second position has a month name, rather than a number:

Dates-3-7

Also note that in this case the format for column 1 was set to 0.00 (rather than @), so this column has been imported as numbers, rather than text.  The fractions in rows 32 and 36 have been converted to decimal values, but the text in column 2 remains unchanged.

More details of the Text-in2 spreadsheet (plus other text functions) can be found at:
Daily Download 32: Text functions

Also see: Extracting numbers with regular expressions

Dates-3-8

Click for full size view

Posted in Excel, UDFs, VBA | Tagged , , , , , | 2 Comments

Dealing with dates 2: Getting cell data type and format code

When working with imported data, or data from an opened csv file, it is often important to know the data type that has been used to store the data.  This may not be obvious, for instance in the previous post we saw that a date imported into a cell formatted as text displays as text, but is converted into a date value when used in a simple formula.  However if the cell is included as part of a range it will be treated as text, with a value of zero, even for a single cell range.  For instance, if we format cell A1 as text and enter 8-jun the cell will display as entered (including the lower case j), but in another cell:

  • =A1 will display 8-jun
  • =A1+0 will (in 2015) display  42163 (the date number for 8 Jun 2015)
  • =Sum(A1) will display 0 (because all text is treated as having a value of 0 when used in the Sum function, even if it looks like a date).

If we now format the cell as General, and re-enter 8-jun, the cell will display 8-Jun and =Sum(A1) will display 42163 (if it is formatted as a number), or 8-Jun (if it is formatted as a date).

The User Defined Function (UDF) below will return information about the data type for the values in a selected single column range:

Function GetDType(DRange As Range) As Variant
Dim DTypeA() As Variant, DVal As Variant, NumRows As Long, i As Long

DVal = DRange.Value
NumRows = UBound(DVal)
ReDim DTypeA(1 To NumRows, 1 To 2)

For i = 1 To NumRows
    DTypeA(i, 1) = DVal(i, 1)
    DTypeA(i, 2) = TypeName(DVal(i, 1))
Next i

GetDType = DTypeA
End Function

The function must be entered as an array function (using Ctrl-Shift-Enter), and returns two columns.  The first shows the actual value stored by Excel (as opposed to the value displayed), and the second the data-type:

Dates-2-1

To work with number formats in VBA (which we need to use to open csv files without changing the data) we need to use Excel’s format codes, but there is no built-in way to show the correct format code for any selected cell format.  An excellent and (almost) comprehensive  guide to Excel number formats can be found at: A comprehensive guide to Number Formats in Excel, but for a quick way to find the correct number format, the UDF below will return the format code for any selected cell:

Function GetNumformat(Target As Range, Optional UseLocal As Boolean = True)

If UseLocal Then
    GetNumformat = Target(1, 1).NumberFormatLocal
Else
    GetNumformat = Target(1, 1).NumberFormat
End If

End Function

Examples of some format codes are shown below:

Dates-2-2

Click for full size view

 

Each format comes in a standard and local version; the local version is returned by default.

Full details of how to generate these sometimes lengthy codes are given in the article linked above, other than for the start of the codes associated with Long Date and Time ([$-F800] and [$-F400]).  A search on these codes suggests that the answer to the question of what they mean is, no-one knows (my lord), but a bit of experimentation shows that if the code between the square brackets is applied as a custom format (including the brackets) a date displays in date format on the spreadsheet, but in date number format in the edit bar, whereas applying the full code the date format displays in both the worksheet and the edit bar.

These functions plus routines to allow data from csv files to be quickly imported and saved without corruption will be available for download in the next post in this series.

 

Posted in Excel, UDFs, VBA | Tagged , , , , , , , | 6 Comments