Update 28 June 2015: Also see Using Linest for non-linear curve fitting examples, hints, and warnings for more examples of fitting exponential and polynomial curves using LinEst.

A frequent question on internet forums everywhere is how to do a least squares fit of a non-linear trend line to a set of data. The most frequent answer is to plot the data on an XY (“scatter”) chart, and then use the “Fit Trendline” option, with the “display equation on chart” box checked. The chart trendlines have the options of: Linear, Exponential, Logarithmic, Polynomial (up to order 6), and Power. There is also a “Moving Average” option, but this does not provide a trendline equation. The chart trendline solution is OK if what you want to do is display the trendline equation on a chart, but if you want to use the numbers in some further analysis, or even just display them elsewhere in the spreadsheet, or copy them to another document, it is far from convenient. Fortunately it is straightforward to get the trendline equations (and other statistics) for each of the chart trendline types using the LINEST worksheet function.

I have created a spreadsheet with examples of each trendline type, which may be downloaded here:

The functions used for linear and polynomial trendlines are shown in the screenshot below (click image for full size view):

Note that:

- The functions as displayed use named ranges (X_1 to X_3 and Y_1 to Y_3)
- The functions are entered as array functions to display all the return values; i.e. enter the function in a cell, select that cell and sufficient adjacent cells to display all the required values, press F2, press Ctrl-Shift-Enter.
- Alternatively the INDEX function may be used to return specific values; e.g. to return the b value from the linear example use =INDEX(LINEST(Y_1, X_1),2)
- Higher order polynomial functions may be returned by simply adding to the list of powers in the curly brackets (but note that this is often not a good idea because of “over-fitting“)

Functions for exponential, power, and logarithmic trendlines are shown below:

In this case the process is not quite so straightforward, because in most cases one or both of the values returned by the function must be modified to give the values shown in the chart trend lines. For these lines it is possible to use either the LINEST function, or the LOGEST function, but since LOGEST simply calls LINEST internally, and provides little if any extra convenience, it does not seem to provide much value. In these examples note that:

- Equations are in the form: y = a.e^bx (exponential), y = a.x^b (power) or y = b.ln(x) + a (logarithmic). In each case in the examples the power factor (b) is shown in bold, and the constant term (a) is shown in bold and italic.
- The LOGEST function returns an equation of the form y = a.b^x
- The LINEST function will return exactly the same values if entered as =EXP(LINEST(LN(Yrange), XRange)), and this line is equivalent to the y = a.e^bx line returned by the chart.

Update 27 Jan 2011:

Coincidentally, Chandoo at Pointy Haired Dilbert is also running a series on estimating trend lines in Excel, which is well worth a look at: Are You Trendy

Pingback: Using Excel statistical functions for trend analysis. | Chandoo.org - Learn Microsoft Excel Online

I would really like to understand how microsoft’s trend line for a +4th order polynomial’s trend line always comes out smooth? What do they do in their algorithm that is different from the Givens (Least Squares) method that almost any other curve fitting program can duplicate up to a 3rd order polynomial (well, the coefficients match at least), but after that MS Excel’s formula’s simply don’t match any other results???? And the kicker is that if you plot their solution, the trend line ALSO does not come out the same? MS Excel has become a “standard” for regression analysis in non-linear systems, so it would be really appreciated if they would provide insight into their algorithm’s.

LikeLike

Scott – can you give an example? With the same data I used for the curves up to cubic above (i.e. a circular quadrant with 11 points and radius 1), for a quartic I get:

-5.03771915 8.14914828 -4.68693650 0.64996956 0.99330532

with descending powers of x.

I get exactly the same coefficients from the chart trend line.

This is with Excel 2010, but I believe that the algorithm is unchanged since 2003. There are some acknowledged issues with earlier versions.

I’ll have a look with different software later, but that will probably have to wait to the weekend.

LikeLike

I’m referring to polynomial trend lines greater than 4th order. I’m trying to develop a program to curve fit signals with very steep sideband slopes, and when I use Excel, the skirts are well adjusted to the curve (6th order polynomials in this example), but the trend line itself is incredibly smooth, and I have significant sign changes throughout the data, which should be reflected in the curve fit as well, and simply isn’t. When I compare polynomial coefficients, they too are significantly skewed, its almost as if MS Excel smooths the data, or performs some other optimization that I can not account for in my algorithm, however, if you take the trend line formula provided in Excel and plot it against the data in a different plotting tool (Matlab, LabVIEW, etc…) the curve fit line does not appear to resemble the plot in Excel. So, I’m really curious as to how accurate the curve fit in Excel truly is, it could be as simple as different accuracy weighting as well, but I am struggling finding a solution that matches Excel.

LikeLike

Scott – having looked at it again, there does seem to be a bug in the Linest results with polynomials of order 6 and higher. I have just fitted a polynomyial to the function y = 1/(x-4.99) for 100 points between x = 5 and x = 6. I used Linest, Excel chart trendline fitting, and the Alglib PolynomialFit routine.

I found that up to 5th order they all gave esentially the same results (but with rounding differences becoming significant at 5th order). For 6th order the chart line and the Alglib line were very close, but the Linest line still followed the same form as the 5th order (i.e. maxima and minima were at about the same positions). For 7th order the Alglib line changed (as you would expect), but the Linest line again stayed close to the 5th order position, and the chart line was past its limit.

I will write this up and post a UDF with the Alglib routine in the next few days. Drop me an e-mail (dougaj4 at google) if you would like a copy of the spreadsheet as it stands. You might also like to have a look at the Alglib site which has some information about how they approach the problem.

LikeLike

The only discussion of a similar problem I found in a quick search was here:

http://www.officekb.com/Uwe/Forum.aspx/excel-chart/13526/Excel-2007-Polynomial-Order-Incorrect

The responses were not very helpful unfortunately.

LikeLike

Doug – I don’t think it’s a bug but a consequence of collinearity see http://support.microsoft.com/kb/828533. A column may be excluded if the sum of squared residuals on a regression of the other predictor variables is small, see http://en.wikipedia.org/wiki/Multicollinearity. Unfortunately, it’s not clear from the description exactly how linest chooses in which order to exclude columns.

An equivalent of LINEST(Y,X) that doesn’t exclude columns is:

=MMULT(MMULT(TRANSPOSE(Y),X^N),MINVERSE(MMULT(TRANSPOSE(X^N),X^N)))

where N={1,0} for a linear fit, N={3,2,1,0} for a cubic fit, etc (X non-zero). However it’s not as numerically stable as the QR decomposition method that linest uses.

LikeLike

Lori, would you mind ellaborating on your collinearity theory, I’m not understanding why that would cause the problem I am seeing.

LikeLike

Scott – The remarks were in response to Doug’s specific example, i can’t claim that they are necessarily relevant in your case but it does show that the algorithms used may become significant for higher order polynomials.

In my tests fitting a 6th degree polynomial gave an x^5 coefficient of zero for linest. This accords with the kb article since the results of linest(x^5,x^{1,2,3,4,6},,1) show that 1-R2=RSS/TSS<1e-16 so x^5 is omitted.

It’s also interesting to compare with fitted values calculated from mmult(X^N,transpose(b)) where b is given by the formula i posted above. For N={6,5,4,3,2,1,0} there is close agreement with linest but not with the chart line. If there is a follow up post a chart of this may shed more light.

LikeLike

Pingback: Non linear regression – 1 | Newton Excel Bach, not (just) an Excel Blog

I have used this function often for the pricing of fixed income securities, is is possible to modify it to allow for missing data? If for instance I do not have data in either the X or Y sets for specific row in my spreadsheet?

LikeLike

Jeff – To find a y-value for a given x-value using a fitted cubic curve, you can try:

=TREND(Yrange,Xrange^{1,2,3},Xvalue^{1,2,3})

Or to find an x-value for a given y-value you can try to find the root of the cubic using:

=IRR(LINEST(Yrange,Xrange^{1,2,3})-{0,0,0,1}*YValue)*1+1

These formulas are easily extended to other powers but you may be better off following the posts on splines for interpolating missing data.

LikeLike

Lori, can you explain more as to exactly what IRR does? Office just says that it is used to find the Internal Rate of Return. I think this is the solution I was needing for my problem but I want to make sure. I have data that I want to find a 5th order polynomial equation to. Then, given a Y-value, find the X-value. The answers that I’m getting using this formula seem to be reasonable, I just don’t know enough about the function of IRR to be confident its what I need.

LikeLike

Tom – I’ll be interested to see Lori’s comments if she drops by, but my explanation of how it works is given here:

https://newtonexcelbach.wordpress.com/2011/12/01/linest-npv-irr-and-solving-polynomials/

Doug

LikeLike

Doug – Thanks for the explanation which is much more comprehensive than mine would have been!

Tom – As a check, try inputting the X Value returned into the TREND formula above and verify that the result equals the original Y value. In fact, a slightly simpler verion of the formula is:

=IRR(LINEST(Yrange-YValue,Xrange^{1,2,3}))*1+1

I would like to add however, that while polynomial approximations can be very useful in theoretical analysis, there are rarely compelling reasons for choosing polynomial fits with empirical data. Models should be based on a priori assumptions as far as possible to avoid problems of data-snooping. If you need a nonlinear approximation for estimation purposes there are a multitutde of other smoothing methods that are often preferable.

LikeLike

Actually on rereading, the question is not directly about finding missing values but rather

how to allow for missing values in data?For a linear fit, SLOPE, INTERCEPT, RSQ and FORECAST skip rows containing blanks. For a nonlinear fit, it’s more challenging since for example LINEST(Y,X^{1,2,3}) errors if there are any blanks in the range. One approach is to try instead:

=LINEST(ISNUMBER(X)*Y,IF(X<>0,ISNUMBER(Y)*X^{0,1,2,3},0),0)

The results should match the values given in the chart trendlines. An extension that also allows for filtered data is to replace ISNUMBER(X) in the formula above by SUBTOTAL(3,OFFSET(X,ROW(X)-MIN(ROW(X)),,1)) and the same for ISNUMBER(Y). Similar substitutions can be applied to other formulas and you can assign names to the expressions for X and Y to keep things simple.

Clearly, there are other ways to achieve the same results, the obvious one being to make a copy without the rows containing the missing values, however this either needs to be done manually or a macro setup to do it for you each time which is less efficient.

LikeLike

Pingback: Using LinEst() on data with gaps | Newton Excel Bach, not (just) an Excel Blog

Jeff: see https://newtonexcelbach.wordpress.com/2011/05/14/using-linest-on-data-with-gaps/

Lori – thanks for the on-sheet solutions. I’m still trying to work out how the second one works!

LikeLike

Pingback: Data Analysis using Linest and the Data Table function. | Chandoo.org - Learn Microsoft Excel Online

I need to use linest to find the coefficients ‘a’ and ‘b’ that fit the curve y=a(1-exp(-bx)) to my data set. Judging by the comments here there are some clever people who would know how to do that, however I’m not one of them, it has me stumped! Thanks to anyone who can offer any help…

LikeLike

Malcolm – as far as i know the easiest way is using the Excel Solver, as given by electricpete at eng-tips:

http://www.eng-tips.com/viewthread.cfm?qid=311283

To use Linest you have to be able to convert the function to a linear function of some functions of x.

LikeLike

Pingback: Linest, NPV, IRR and solving polynomials | Newton Excel Bach, not (just) an Excel Blog

Hi all, multi-collinearity causes huge problems in polynomial regression if the range of x-values extends to values much larger than 1 in magnitude. For example, you might want to calculate the correlation coefficients between x, x^2, x^3 and so on if x = 20,21,22,23…,30; they are all close to 1. This can even cause fatal interpretation errors if the estimated confidence intervals are used for some kind of error propagation analysis without accounting for the large co-variances between the regression coefficients. The only way out of that trap is to use orthogonal (or even orthonormal) polynomials like those of Legendre as basis functions. As these are defined on the interval [-1;1], the variables have to be transformed before the regression takes place. And always mind the co-variances between the coefficients in an error propagation analysis, if you work with the coefficients for x, x^2…!

The usage of at least orthogonal polynomials is the only method that allows to reliably detect non-linear relationships far away from the origin, for example, when you want to do a non-linear regression of income on age (30 – 70) or so.

LikeLike

Georg – Good points, some of these were alluded to in comments from the follow-up

post. For computing the coefficients, LINEST/TREND can be applied to data centered around the mean and the results are in close agreement to other high-precision polynomial regression algorithms. The QR/SVD decomposition methods for calculating least squares estimates can be seen as finite dimensional analogs to orthogonal polynomial expansions of L^2-functions.

LikeLike

I would like to see how others graphically show the col-linearity – I have just posted on http://www.excelfox.com how to get a correlation map using colors. The post is under the heading

“Using property ColorScaleCriteria color you cells” (in the download center) – what it shows is a correlation matrix of 3 wet chemistry assays (Y variables) and Absorbencies as the X data — so these are adjacent frequencies (X values) with a very high covariance.

Anybody who has examples of how to graphically show a map similar to what I just posted would be appreciated.

PS – I had to cut out data (max file size is 100kb) – so if you plot the X-values you will see ‘noise’ in the spectra.

LikeLike

Rasm – I couldn’t log in to view your file, but for a visual plot of collinearity in three regressor variables I would plot one variable against a best fit linear combination of the other two. It’s insightful to do this for Doug’s polynomial example mentioned above.

Starting from a blank workbook, here’s a few steps to set up the chart and plot the data, no datasheets or code modules are required. You can just press Alt+F11 and enter sequentially in the immediate window, the corresponding UI commands should be self-evident.

charts.add

activechart.seriescollection.newseries

set s=activechart.SeriesCollection(1)

s.Type=xlXYScatter

names.add "x", [4.99+row(sheet1!1:101)/100]

names.add "y", "=x^3"

names.add "z", "=trend(x^3,x^{1,2})"

s.formula = "=series(,sheet1!y,sheet1!z,1)"

s.trendlines.add DisplayRSquared:=True

s.trendlines(1).datalabel.numberformat="0.000000000000000"

This gives a near exact straight line with R^2=0.99999946. Choosing Debug>Add Watch with [linest(x^3,x^{1,2})] also gives the same value in the (3,1) element. Extending to [linest(x^6,x^{1,2,3,4,5})] gives R^2=1 exactly to 15dp, so x^6 coefficeint is dropped from the calculation of coefficients as it adds no more information.

To reduce the collinearity the first step is to center around the mean by changing 4.99 to -0.51 above this gives R^2=0.8401098. The second step is to transform the columns so they are uncorrelated, this is what Excel and other least squares methods do “behind the scenes”. Choosing the cubic Legendre polynomial in place of x^3 as below gives R^2=0.0019898.

names.add "x", [(x-average(x))*2]

names.add "y", "=2.5*x^3-1.5*x"

LikeLike

Lori

I will try your method – but I have several 100s X values – up 1050.

I do preprocess the data i.e. SNV and typically a 1st derivative (with a smooth and a gap) – next I mean center the data. I typically find the best model using PLS or MLR. The data I work with are spectra – that is why I have extreme col-linearity – the dependent variable is typical a concentration. But I do find the approach described in this threat very interesting. Again thanks for your reply – I will try your method – may give me some inspiration.

LikeLike

In order to compare the extent of collinearity of two vectors V1 and V2 to our everyday experience of Euclidian sapce, it might help to calculate the angle A12 between them according to the formula

A12=180/PI()*ARCCOS(SUMPRODUCT(V1,V2)/SQRT(SUMPRODUCT(V1,V1)*SUMPRODUCT(V2,V2)))

A result of 0 means totally collinear and 90 means totally independent.

LikeLike

Hi,

I want to find out the trendline for a set of data (x,Y) but I want to get the trendline in the form of sin(x) and cos(x). However, using the excel it is only possible to have it in the form of linear, power, .. but not the sin or cos(x). How you think I can solve it?

LikeLike

Well asafa, if you really meant A*sin(x)+B*cos(x), you could use linest because your model is linear in the parameters. But I guess you want to solve something like A*cos(B*x)+C*cos(B*x) in order to determine a spectral component. As this model is non-linear in the parameters, you have to use a non-linear least squares method, for example, Excel’s solver. My experience of 20 years of NLLSQ fits lets me strongly recommend to use VBA to interface to an external DLL that allows for suppying the derivatives of the model function with respect to the parameters analytically. Doug has some posts on how to interface to Fortran. The procedure for interfacing to C is quite similar. There are free NLLSQ routines available for download on the web (NetLib, AlgLib,…). You could use the free CAS Maxima, e.g., in order to determine the analytical derivatives if your models become more complicated than just being the sum of a sine and a cosine.

LikeLike

asafa and Georg – see:

https://newtonexcelbach.wordpress.com/2011/03/01/non-linear-regression-1/

and

https://newtonexcelbach.wordpress.com/2011/03/05/non-linear-regression-%e2%80%93-2-alglib-functions/

for post on using the Excel Solver and the Alglib routines for non-linear regression in Excel.

I agree with Georg that for anyone doing serious work on this the purpose written routines such as those from Alglib offer much better performance than using Solver.

LikeLike

Hi everbody, great site!!!. I am working in forcasting project, and its so hard to read in the chart the exactly number of the trend, I am looking for information, how to get the values of the linear, polynomial, exponential etc. lines but in values, for example my data is A1:A20 so I would like to see on B the linear values, C logarithmic, D polynomial etc. how can I do this, please any help welcome.

LikeLike

Baum – Have a look at the download spreadsheet. It contains the examples shown in the screenshots which returns the data you want. If anything isn’t clear, please ask.

LikeLike

anyone knows any video link to see exactly how to get the coefficients trend lines.

LikeLike

Hi,

My name is Zack. I was wondering if there was any way to program an equation with a missing variable in excel, and have it calculate for the missing variable.

I.E.) X=Vo*t + 1/2*a*t^2 when knowing what x, Vo, and a are equal to, to try and find what “t” is.

or I.E.) V=Vo + a*t knowing Vo, a, and t to find “V”

If anyone could tell me how to do this with several other equations, I would greatly appreciate it.

my email is “zackgane@yahoo.com” thank you

LikeLike

Zack – for your first example you can use the Excel Goal-seek function (under the Data tab in 2007/2010). Also have a look at the Solver which gives more control and can solve more complex problems (more than one unknown for instance). There are also some posts here with UDFs to solve polynomial equations, which will do the job more quickly and conveniently than Goal-seek if you have a lot of them:

https://newtonexcelbach.wordpress.com/2010/08/04/solving-cubic-and-quartic-equations-with-excel/

https://newtonexcelbach.wordpress.com/2011/01/13/solving-higher-order-polynomials/

For the second example you can just enter the formula in a cell to find V, but maybe you meant to find t? In that case you can use the same methods as for the first example, but some simple algebra will give the result: t = (V-Vo)/a. There is a formula solution to the first example as well; look up quadratic formula.

LikeLike

Pingback: 4 Year Report | Newton Excel Bach, not (just) an Excel Blog

Hi

This is shiva

I have some points that should fit in a bell curve..

What i do to get the ordinates for any point.

Email. shiv_yers@hotmail.com

LikeLike

hi, i’m a final year civil engineering student and i’m doing a spreadsheet for the design of steel components using microsoft excel.. does anyone know how to generate the bending moment and shear force diagrams in excel..? thanks 🙂

LikeLike

Hawwa – have you tried the Internet?

There are a stack of programs that will generate bending moments and shear forces, with different levels of complexity.

If you are looking for a continuous beam analysis you could start here:

https://newtonexcelbach.wordpress.com/2012/05/03/using-udfs-continuous-beam-example-3/

LikeLike

Pingback: Daily Download 14: Curve Fitting 1 | Newton Excel Bach, not (just) an Excel Blog

Noob question and perhaps attributed to a superficialy understanding of arrays in excel (amongst other things) but for the life of me, I can’t understand why in the logarithmic example, the forumlas in cells O55 and P55 return different values but appear to be exactly the same. What subtle point am I missing?

LikeLike

Scott – yes it is the use of array formulas that is confusing if you are not used to them.

The two cells contain the same formula, but it returns an array rather than a single cell, in this case the array is 1 row x 2 columns. The values are a and b in the formula

a(ln(x)) + b. You can see that the values match those of the trend line in the chart starting at C50.

To return both results of the array:

– Enter the formula in O55 in the usual way.

– Select O55:P55

– Press F2

– Press Ctrl-shift-enter

You should now get both results, and the formula will display in the edit line with {} around it, the same in both cells: =LINEST(Y_3,LN(X_3)).

You can also select O55:P55 to start with and enter with ctrl-shift-enter.

More details at: https://newtonexcelbach.wordpress.com/2011/05/10/using-array-formulas/

LikeLike

Thanks for the quick reply 🙂 Last question would then be whether its possible to actually have all of this in a single cell (i.e. a single formula that would return the value of y in a cell by calculating a(ln(x)) + b) or do I essentially have to calculate a and b in separate cells like in your file and then do another cell to calculate y?

LikeLike

Scott – it’s possible but the formula would get quite long. You can use the Index function to return any value from an array, the same as if it was a range, so:

=INDEX(LINEST(Y_3,LN(X_3)),2) will return the b value.

you would end up with something like:

=LINEST(Y_3,LN(X_3)) * LN(x) + INDEX(LINEST(Y_3,LN(X_3)),2)

But personally I’d rather return the results of the array formula in two cells, and use a third cell to get the desired result.

LikeLike

Doug,

Thanks for your answer (on 4-26-13 @ 921am) to Scott’s question. I had the same question. After following your directions, I was able to recreate the array with the right constants. But, when I input an x value of 2, I get 1.14, when I think I should ave gotten 9.82. Can you help me understand what I am doing wrong? My formula was =O51*(LN(A37)+P51).

Best,

John

LikeLike

John – have another look at the original post, starting at “In this case the process is not quite so straightforward …”

The problem is that Linest will only fit a straight line to the data, so if we want a non-linear fit we have to convert the data into a form that is a straight line, get the coefficients for that line, then convert it back to the non-linear form we want.

In the case of the exponential curve we fit a straight line to ln(y) = ax + b, so y = Exp(ax + b) = Exp(b) * Exp(ax)

For the example in the spreadsheet, Exp(b) is in cell Q51, so to return the y value for x = 2 (in cell A37) you need:

= Q51 * EXP(O51 * A37)

Note that this returns 10.411, rather than 9.82, because you are fitting a smooth curve to scattered data. Looking at the Exponential chart you can see that in this case the exponential curve is not a good fit to the data. The power curve gives a much better fit, but even in that case extrapolating past x = 11 would probably rapidly diverge.

LikeLike

Thanks for the information. I want to ask one thing that in power series trend line, if the x axis is logarithmic will there be any effect on the values. Please reply.

LikeLike

The easiest way to check is to try it and see, but the answer is no, making the scale logarithmic only changes the way the graph is plotted, it doesn’t change the trend line results.

LikeLike

Your blog is very useful. As a new VBA user I’m having touble to extend this NL fit case a simple on variable case. Have tried bchanging both AL_NLFit and AL_NLFitText without success. I hit a wall that says : cannot change part of an array. I even tried tp create a new one with the insert function for yusing UDF but could not get it to work.

Can you please help me? I’m a curious student in BA and I want to learn how to this. Thanks in advamnce.

LikeLike

Have a look at:

https://newtonexcelbach.wordpress.com/2011/05/10/using-array-formulas/

If you still have problems after reading that you could send a sample file to dougaj4 at the usual google mail address.

LikeLike

I did it and it works!

Thank you very much!

LikeLike

Thanks for the information! I have a question, how do I calculate the b, c and d for the Cubic curve estimation example? When I use the formula =Linest(Y_1,X_1,^{1,2,3}) I get the number presented as a (-2,08199), but how do I calculate the numbers b, c and d?

LikeLike

Lenn – sorry for the delay in replying.

You have to enter the Linest function as an array function to display all the results.

If you already have the function entered and displaying the first result then:

-Select that cell and the three adjacent cells to the right.

-Press F2 to enter Edit mode.

-Press Ctrl-shift-enter

The four results should display in the selected cells.

LikeLike

Thanks for the explanation! That makes sense!

LikeLike

Thank you for the file.

LikeLike

I know this is a blog in its senior years …

Anyway I’d like to make the point that Excel _does_ have issues in this area!

(1) If you look at the R² of a trendline other than polynomials in a chart, it is NOT the same one you

get when you use the resp. worksheet function.

(2) If your y-data are negative, you cannot use trendlines, nor the equivalent worksheet functions,

when you want to model exponential / power / logarithmic approximations.

Even though the real world knows examples of exponential functions that have negative values …

(3) Whether you use trendlines in charts or the linest / growth worksheet functions: the coefficients

you get for exponential / power / logarithmic approximations of your data series will (in general)

NOT be best fit!

By the way, OpenOffice / LibreOffice seem to have similar issues in their CALC components …

Feel free to drop comments if you are interested in examples.

LikeLike

This is discussed in the comments.

Also see the Alglib library for a much more stable version for high order polynomials:

VBA version at https://newtonexcelbach.wordpress.com/2011/02/04/fitting-high-order-polynomials/

I will post an update of the Python Alglib version later today.

LikeLike

I couldn’t download the file

LikeLike

Which file? If you mean linest-poly.xls, it is downloading OK here.

LikeLike

I would like to know how this linest function works for a data set where the prices for a future contract with different days to maturity is given.

For eg : 1302.4 for 21 days

1312.6 for 40 days

1326.8 for 60 days.

I need a price for 30 days to maturity using Linest funtion.

LikeLike

See: https://newtonexcelbach.wordpress.com/2015/06/28/using-linest-for-non-linear-curve-fitting-examples-hints-and-warnings/

LikeLike

Pingback: Using Linest for non-linear curve fitting, examples, hints and warnings | Newton Excel Bach, not (just) an Excel Blog

i would like to know for a second degree polynomial with 5 variables, how to convert it to a third degree polynomial

LikeLike

Can you supply an example. It isn’t clear to me what you want to do.

LikeLike

First of all: thanks for your great two extensive posts on this topic.

Second: I’m going nuts.

I have the following dataset:

1995 1982467844

1996 2267857601

1997 2404755862

1998 2432389525

1999 2444858940

2000 2635654453

2001 2705084500

2002 2995833686

2003 3473777485

2004 3951721285

2005 4619625038

2006 4775887626

2007 5442306488

2008 6312303423

2009 6046918247

2010 6606480677

2011 7278256840

2012 7543592253

2013 8029216606

2014 8148406215

which for all my purposes approximates well enough to a 5-th order polynomial. Indeed, the fitting polynomial given by the excel plot has the coefficients

m5 2.48985094033181E+04

m4 -2.49631251621755E+08

m3 1.00111213806761E+12

m2 -2.00740350104956E+15

m1 2.01258764717869E+18

m0 -8.07110669655566E+20

however, if I try to obtain an estimation of a 5-th order polynomial from the data using:

=TREND(Y_1;X_1^{1\2\3\4\5};X_1^{1\2\3\4\5})

(where I have defined the ranges Y_1 & X_1 accordingly, and where I have to use \ as in my system the comma is the decimal separator) I obtain a much less exact prediction.

If I use INDEX(LINEST(Y_1;X_1^{1\2\3\4\5});1;n) to check the coeficients I obtain are:

m5 -0.205863309

m4 1033.029828

m3 -1382303.147

m2 0

m1 0

m0 1117576429140730

which are very diffferent! If I reconstruct the polynomial with those coefficients I obtain the estimation calculated with TREND.

This got me curious (and nuts) and I started playing around with the degree of the polynomial compating always with the plot fitting. I found that TREND and LINEST are giving me the same solution of the plot ONLY UNTIL a polynomial of 3rd degree!!!! That makes no sense and is not accurate enough for my purposes.

I’ve check my options section, but found no setting that could be related to this. The only different thing to what I’ve seen in several posts is the regional configuration of my machine, but that would be odd. I’m running Excel 2013.

Any ideas? Thanks before hand!

LikeLike

Update: is not related to the regional configuration (of course). Today I checked chaging to using the dot as decimal separator, and the error persist.

Cheers

Francisco

LikeLike

There are a couple of issues here:

1. The Linest function and the chart trend line do give different results sometimes. Linest tends to return coefficients of zero for one or more of the higher powers. It is usually OK up to 4th or 5th degree polynomials, but as you found, for some data it diverges even at the 4th degree.

2. Your x values, as year numbers, are large numbers over a small range, so when you raise them to the 5th power and multiply by a large coefficient you get a very large number, and when you subtract two large numbers with a small difference you get a big loss in precision.

If you replace the x range with 0 to 19 (i.e. number of years from 1995) you will find that Linest and the chart trend line give the same result, which also gives a good fit to the data.

You might also like to try the xlwSciPy spreadsheet, which links to the Python SciPy library (see https://newtonexcelbach.wordpress.com/2016/10/02/xlwscipy-1-09-update-for-xlwings-0-10-and-scipy-0-18-1/). This has an xl_PolyFit function to return the best fit polynomial coefficients, and xlPolyVal to evaluate a polynomial for any x. Both are on the SolvePoly sheet. You need to install Python to use these functions.

One other point you might like to look at is a difference in the R^2 value returned by the trend line and Linest. Both are correct, but use different definitions. See https://newtonexcelbach.wordpress.com/2009/11/03/chart-trend-lines-and-the-linest-function/

Finally, it is worth considering if the 5th degree polynomial is really more accurate, which really depends on what you intend to do with the numbers.

LikeLike

THANKS A LOT! of course it was that. I totally forgot my numerical methods principles. Obviously software has made me lazy XD

Now is corrected and I’m a happy user again. Have a great 2017! Greetings from Chile!

LikeLike

I think the trendline on the graph lies. I’ve created an Excel wrapper regression function that uses variations of the Linest function (Exponential, Power, Linear, Log, and Polynomial up to order 7) which is very accurate. I can plot the known y-values against the function derived values and see which ones match the known curve closest. If I then plot a trendline using the function derived values, the polynomial equation doesn’t have the same coefficients as the Linest function calculates. Additionally, plotting the graphical trendline equation derived y-values doesn’t even match the trendline itself even though it may show a R-squared value near 1. I’d be happy to provide the function to anyone who wants it, or perhaps, Mr. Jenkins can host it here.

LikeLike

It is true that Linest and chart trendlines can give different results, but sometimes they are both wrong (as described in the post). If you really need to fit higher order polynomials there is a much better function here: https://newtonexcelbach.wordpress.com/2011/02/04/fitting-high-order-polynomials/

Also the two can return different R values, both of which are right. See:

https://newtonexcelbach.wordpress.com/2009/11/03/chart-trend-lines-and-the-linest-function/

Also note that if you want to use the trend line results, the output should be formatted to display 15 significant figures (but it’s more convenient to use Linest and get the results as values directly anyway).

LikeLike

Thanks for the reply. Most of my original post comes from applying many of the ideas you referenced. The link is to a spreadsheet that shows my concern. It uses linest in vba to develop the regression coefficients and statistics. I then graph the known independent values with a trendline and equation. That trendline matches up almost exactly with those values despite the fact that the trendline equation isn’t the same as equation that produces the EQN results. I graphed with them is the known values, and they match up with them using that different equation. I then use a version of Ian Huitson’s trendY function that I modified into a subroutine to get all the values of the graph trendline into a column, and I graph them. The values from the trendline bear no resemblance to any of the other lines. Hopefully, you can explain why.

https://www.dropbox.com/s/adyl2io6hnf6hgp/Regression%20Analysis_v1.xlsm?dl=0

LikeLike

Larry – I had a look at your results, but I don’t know how you got your plotted trend line values from the coefficients on the chart. I re-ran the “get trendline values” macro, and got a perfect fit to the graph. I also generated the values myself, using the chart trendline coefficients, and got the same results. Also I got exactly the same coefficients using my xl_polyfit function, so for those values both the chart trend line and Linest give the same good results with a 6th order polynomial.

By the way, your results in Column D link to a function on your hard disk, so display as #NAME?

LikeLike

Sorry about that link. It’s fixed if you want to redownload. I surely can’t figure out why the “Get trendlines values” macro works for you and not for me. Are you using a 64 bit Excel?

LikeLike

Also, for those using Python and xlwings, the xlSciPy spreadsheet:

https://newtonexcelbach.wordpress.com/2016/10/02/xlwscipy-1-09-update-for-xlwings-0-10-and-scipy-0-18-1/

has an xl_Polyfit function, as well as general purpose fitting functions.

LikeLike

I’ll have another look. I’m using 32 bit Excel with 64 bit Windows 10.

I’ll have a look through the code and see if I can find anything that might be throwing out the results.

LikeLike

I think I have found the problem:

The chart in your spreadsheet is a “line” chart, which doesn’t have x range values. The specified x range is treated as labels, even if they are numbers.

If you create a trend line, it treats the x values as being 1,2,3,…

You need to change the chart type to “scatter (XY)”, then the chart trend line will return the same results as Linest.

Also for the TrendY macro to work you should switch off the display of the R2 value (or modify the code so it ignores the R2).

I’m surprised that line charts will display a trend line. It seems like asking for trouble to me.

LikeLike

Yup, X-Y plot did the trick, everything works, and FWIW, the TrendY macro has a check for the R-sqr display. If its there, it turns it off, parses the trendline equation, and then turns it back on. Thanks for your help.

LikeLike

And another FWIW, I added drawing the trendline equation and R-sqr on the chart when you make a selection in the dropdown box.

LikeLike

There is a discussion of this very problem at:

http://www.excelforum.com/excel-charting-and-pivots/537215-linear-trendline-wrong-equation.html

The conclusion is the same: if you are plotting XY data, then use an XY chart (otherwise known as a Scatter chart, even though you can connect the points with a line).

Note that a line chart will only display XY data correctly if the X range is equally spaced increasing points, and it will only display correct trend line data if the X range is a consecutive series of integers, starting at 1.

LikeLike

Pingback: Displaying trend line equations on line charts | Newton Excel Bach, not (just) an Excel Blog

For “Power” equations, when I type =LINEST(LN(Y_2),LN(X_2)) in the example spreadsheet, I can only get the power factor (b) and not the constant term (a). BTW, entering it as an array doesn’t change anything. I know I need to use =EXP(P53) but P53 is the figure I cannot reproduce.

LikeLike

When you enter as an array formula, are you selecting both cells (O53 and P53) first?

– Select both cells

– Press F2

– Press ctrl-shift-enter

If that doesn’t work, please send your spreadsheet to dougaj4 at gmail, and I will have a look.

LikeLike

That worked! I don’t know why, but I guess I don’t care. Thank you so much.

LikeLike

Hello,

I am very new to all of this and am trying to figure it out for a paper I am working on. I am trying to identify irregularities on a tooth surface, so I have a profile that effectively charts the z-coordinates (as the x-axis) by the y-coordinates (distance across the crown). I am trying to work out a method for identifying where the profile fluctuates too much from where it ‘should’ be – so where it is too depressed. A paper that was recently published identified defects as those where the tooth surface “deviates negatively by more than one moving standard deviation from a fitted sixth order polynomial along the shape of the tooth.” So I was trying to figure out using polynomial regression for my own analysis, and while I was trying to figure out how to take the linest-generated regression and use it as data points to calculate residual at each point, I came across this blog. Long explanation, but apart from still being a bit stumped on how to take the linest-generated information and use it for the further steps, I’m now concerned that polynomial regression might be a problem here. The paper I used specified a 6th order polynomial regression, and my data points are very large – for the y-coordinates into 5000 (um) and for the x-coordinates into the 100’s. Are you able to confirm for me whether this would, indeed, be a problem? If so, do you have any suggestions?

Thank you,

(Paper Reference: Henriquez and Oxenham. 2007. An alternative objective microscopic method for the identification of linear enamel hypoplasia (LEH) in the absence of visible perikymata. Journal of Archaeological Sciences: Reports 14: 76-84).

LikeLiked by 1 person

I’m not an expert in statistical techniques, but I would be cautious about using a polynomial for this application. The main problem is that at a sharp change of direction polynomials tend to oscillate, rather than following the actual curve. Also the Excel Linest function becomes unreliable at 5th or 6th order. There are some functions here that perform much better for high order polynomials, that might be better for your application. Search for alglib and scipy. Both require installation of external libraries, but both are free. If you do decide to have a look at them, let me know if you have any problems with installation or running them.

LikeLiked by 1 person

Thank you very much. I’ll work on figuring out another way. I’m a bit confused, to be honest, by both alglib and skipy – they both need programming language, don’t they? A bit beyond me at the moment but I’ll continue to explore.

LikeLike

The spreadsheets are actually set up so you can use the Alglib and Scipy functions without any programming, in the same way as the built-in Excel functions. The documentation in the spreadsheets is very brief though, so for advanced functions you would need to refer to the documentation for the base libraries, which is comprehensive, but not an easy read. If you have colleagues and/or tutors that can help with the maths it would probably be best to use whatever they are familiar with. If you do decide to use one of my spreadsheets feel free to ask for help though. I would be happy to set up an example using your data, if that would help.

LikeLike

Hi, I have data with missing values and I am trying to calculate coefficients for all trend lines (i.e. linear, 2nd order poly, 3rd order poly, power, log & exp).

The examples above for second order poly are great thanks and I have used these to also get the 3rd order polys. However, I am stumped about how to now calculate for the power, log and exponential trendlines.

Any help would be great please!

Thanks, Joe

LikeLike

Thanks for asking!

I was going to say that the examples on the first sheet will also work with the Linestgap function, but they don’t because the input needs to be two ranges, and it wont accept LOG(range), or EXP(range).

You need to set up a third range with the Log or Exp of the data, and this needs to check for blank cells, so for instance for the first example enter:

=IFERROR(LOG(B36),””)

, say in cell N61, and copy that down to N71.

You can then use that range with linestgap:

=LINESTgap(N61:N71,X_2)

which will work for data with gaps.

Please ask if you have any problems getting that to work.

I’ll post a more detailed example in the next few days.

LikeLike