A frequent question on internet forums everywhere is how to do a least squares fit of a non-linear trend line to a set of data. The most frequent answer is to plot the data on an XY (“scatter”) chart, and then use the “Fit Trendline” option, with the “display equation on chart” box checked. The chart trendlines have the options of: Linear, Exponential, Logarithmic, Polynomial (up to order 6), and Power. There is also a “Moving Average” option, but this does not provide a trendline equation. The chart trendline solution is OK if what you want to do is display the trendline equation on a chart, but if you want to use the numbers in some further analysis, or even just display them elsewhere in the spreadsheet, or copy them to another document, it is far from convenient. Fortunately it is straightforward to get the trendline equations (and other statistics) for each of the chart trendline types using the LINEST worksheet function.

I have created a spreadsheet with examples of each trendline type, which may be downloaded here:

The functions used for linear and polynomial trendlines are shown in the screenshot below (click image for full size view):

Note that:

- The functions as displayed use named ranges (X_1 to X_3 and Y_1 to Y_3)
- The functions are entered as array functions to display all the return values; i.e. enter the function in a cell, select that cell and sufficient adjacent cells to display all the required values, press F2, press Ctrl-Shift-Enter.
- Alternatively the INDEX function may be used to return specific values; e.g. to return the b value from the linear example use =INDEX(LINEST(Y_1, X_1),2)
- Higher order polynomial functions may be returned by simply adding to the list of powers in the curly brackets (but note that this is often not a good idea because of “over-fitting“)

Functions for exponential, power, and logarithmic trendlines are shown below:

In this case the process is not quite so straightforward, because in most cases one or both of the values returned by the function must be modified to give the values shown in the chart trend lines. For these lines it is possible to use either the LINEST function, or the LOGEST function, but since LOGEST simply calls LINEST internally, and provides little if any extra convenience, it does not seem to provide much value. In these examples note that:

- Equations are in the form: y = a.e^bx (exponential), y = a.x^b (power) or y = b.ln(x) + a (logarithmic). In each case in the examples the power factor (b) is shown in bold, and the constant term (a) is shown in bold and italic.
- The LOGEST function returns an equation of the form y = a.b^x
- The LINEST function will return exactly the same values if entered as =EXP(LINEST(LN(Yrange), XRange)), and this line is equivalent to the y = a.e^bx line returned by the chart.

Update 27 Jan 2011:

Coincidentally, Chandoo at Pointy Haired Dilbert is also running a series on estimating trend lines in Excel, which is well worth a look at: Are You Trendy

Pingback: Using Excel statistical functions for trend analysis. | Chandoo.org - Learn Microsoft Excel Online

I would really like to understand how microsoft’s trend line for a +4th order polynomial’s trend line always comes out smooth? What do they do in their algorithm that is different from the Givens (Least Squares) method that almost any other curve fitting program can duplicate up to a 3rd order polynomial (well, the coefficients match at least), but after that MS Excel’s formula’s simply don’t match any other results???? And the kicker is that if you plot their solution, the trend line ALSO does not come out the same? MS Excel has become a “standard” for regression analysis in non-linear systems, so it would be really appreciated if they would provide insight into their algorithm’s.

Scott – can you give an example? With the same data I used for the curves up to cubic above (i.e. a circular quadrant with 11 points and radius 1), for a quartic I get:

-5.03771915 8.14914828 -4.68693650 0.64996956 0.99330532

with descending powers of x.

I get exactly the same coefficients from the chart trend line.

This is with Excel 2010, but I believe that the algorithm is unchanged since 2003. There are some acknowledged issues with earlier versions.

I’ll have a look with different software later, but that will probably have to wait to the weekend.

I’m referring to polynomial trend lines greater than 4th order. I’m trying to develop a program to curve fit signals with very steep sideband slopes, and when I use Excel, the skirts are well adjusted to the curve (6th order polynomials in this example), but the trend line itself is incredibly smooth, and I have significant sign changes throughout the data, which should be reflected in the curve fit as well, and simply isn’t. When I compare polynomial coefficients, they too are significantly skewed, its almost as if MS Excel smooths the data, or performs some other optimization that I can not account for in my algorithm, however, if you take the trend line formula provided in Excel and plot it against the data in a different plotting tool (Matlab, LabVIEW, etc…) the curve fit line does not appear to resemble the plot in Excel. So, I’m really curious as to how accurate the curve fit in Excel truly is, it could be as simple as different accuracy weighting as well, but I am struggling finding a solution that matches Excel.

Scott – having looked at it again, there does seem to be a bug in the Linest results with polynomials of order 6 and higher. I have just fitted a polynomyial to the function y = 1/(x-4.99) for 100 points between x = 5 and x = 6. I used Linest, Excel chart trendline fitting, and the Alglib PolynomialFit routine.

I found that up to 5th order they all gave esentially the same results (but with rounding differences becoming significant at 5th order). For 6th order the chart line and the Alglib line were very close, but the Linest line still followed the same form as the 5th order (i.e. maxima and minima were at about the same positions). For 7th order the Alglib line changed (as you would expect), but the Linest line again stayed close to the 5th order position, and the chart line was past its limit.

I will write this up and post a UDF with the Alglib routine in the next few days. Drop me an e-mail (dougaj4 at google) if you would like a copy of the spreadsheet as it stands. You might also like to have a look at the Alglib site which has some information about how they approach the problem.

The only discussion of a similar problem I found in a quick search was here:

http://www.officekb.com/Uwe/Forum.aspx/excel-chart/13526/Excel-2007-Polynomial-Order-Incorrect

The responses were not very helpful unfortunately.

Doug – I don’t think it’s a bug but a consequence of collinearity see http://support.microsoft.com/kb/828533. A column may be excluded if the sum of squared residuals on a regression of the other predictor variables is small, see http://en.wikipedia.org/wiki/Multicollinearity. Unfortunately, it’s not clear from the description exactly how linest chooses in which order to exclude columns.

An equivalent of LINEST(Y,X) that doesn’t exclude columns is:

=MMULT(MMULT(TRANSPOSE(Y),X^N),MINVERSE(MMULT(TRANSPOSE(X^N),X^N)))

where N={1,0} for a linear fit, N={3,2,1,0} for a cubic fit, etc (X non-zero). However it’s not as numerically stable as the QR decomposition method that linest uses.

Lori, would you mind ellaborating on your collinearity theory, I’m not understanding why that would cause the problem I am seeing.

Scott – The remarks were in response to Doug’s specific example, i can’t claim that they are necessarily relevant in your case but it does show that the algorithms used may become significant for higher order polynomials.

In my tests fitting a 6th degree polynomial gave an x^5 coefficient of zero for linest. This accords with the kb article since the results of linest(x^5,x^{1,2,3,4,6},,1) show that 1-R2=RSS/TSS<1e-16 so x^5 is omitted.

It’s also interesting to compare with fitted values calculated from mmult(X^N,transpose(b)) where b is given by the formula i posted above. For N={6,5,4,3,2,1,0} there is close agreement with linest but not with the chart line. If there is a follow up post a chart of this may shed more light.

Pingback: Non linear regression – 1 | Newton Excel Bach, not (just) an Excel Blog

I have used this function often for the pricing of fixed income securities, is is possible to modify it to allow for missing data? If for instance I do not have data in either the X or Y sets for specific row in my spreadsheet?

Jeff – To find a y-value for a given x-value using a fitted cubic curve, you can try:

=TREND(Yrange,Xrange^{1,2,3},Xvalue^{1,2,3})

Or to find an x-value for a given y-value you can try to find the root of the cubic using:

=IRR(LINEST(Yrange,Xrange^{1,2,3})-{0,0,0,1}*YValue)*1+1

These formulas are easily extended to other powers but you may be better off following the posts on splines for interpolating missing data.

Lori, can you explain more as to exactly what IRR does? Office just says that it is used to find the Internal Rate of Return. I think this is the solution I was needing for my problem but I want to make sure. I have data that I want to find a 5th order polynomial equation to. Then, given a Y-value, find the X-value. The answers that I’m getting using this formula seem to be reasonable, I just don’t know enough about the function of IRR to be confident its what I need.

Tom – I’ll be interested to see Lori’s comments if she drops by, but my explanation of how it works is given here:

http://newtonexcelbach.wordpress.com/2011/12/01/linest-npv-irr-and-solving-polynomials/

Doug

Doug – Thanks for the explanation which is much more comprehensive than mine would have been!

Tom – As a check, try inputting the X Value returned into the TREND formula above and verify that the result equals the original Y value. In fact, a slightly simpler verion of the formula is:

=IRR(LINEST(Yrange-YValue,Xrange^{1,2,3}))*1+1

I would like to add however, that while polynomial approximations can be very useful in theoretical analysis, there are rarely compelling reasons for choosing polynomial fits with empirical data. Models should be based on a priori assumptions as far as possible to avoid problems of data-snooping. If you need a nonlinear approximation for estimation purposes there are a multitutde of other smoothing methods that are often preferable.

Actually on rereading, the question is not directly about finding missing values but rather

how to allow for missing values in data?For a linear fit, SLOPE, INTERCEPT, RSQ and FORECAST skip rows containing blanks. For a nonlinear fit, it’s more challenging since for example LINEST(Y,X^{1,2,3}) errors if there are any blanks in the range. One approach is to try instead:

=LINEST(ISNUMBER(X)*Y,IF(X<>0,ISNUMBER(Y)*X^{0,1,2,3},0),0)

The results should match the values given in the chart trendlines. An extension that also allows for filtered data is to replace ISNUMBER(X) in the formula above by SUBTOTAL(3,OFFSET(X,ROW(X)-MIN(ROW(X)),,1)) and the same for ISNUMBER(Y). Similar substitutions can be applied to other formulas and you can assign names to the expressions for X and Y to keep things simple.

Clearly, there are other ways to achieve the same results, the obvious one being to make a copy without the rows containing the missing values, however this either needs to be done manually or a macro setup to do it for you each time which is less efficient.

Pingback: Using LinEst() on data with gaps | Newton Excel Bach, not (just) an Excel Blog

Jeff: see http://newtonexcelbach.wordpress.com/2011/05/14/using-linest-on-data-with-gaps/

Lori – thanks for the on-sheet solutions. I’m still trying to work out how the second one works!

Pingback: Data Analysis using Linest and the Data Table function. | Chandoo.org - Learn Microsoft Excel Online

I need to use linest to find the coefficients ‘a’ and ‘b’ that fit the curve y=a(1-exp(-bx)) to my data set. Judging by the comments here there are some clever people who would know how to do that, however I’m not one of them, it has me stumped! Thanks to anyone who can offer any help…

Malcolm – as far as i know the easiest way is using the Excel Solver, as given by electricpete at eng-tips:

http://www.eng-tips.com/viewthread.cfm?qid=311283

To use Linest you have to be able to convert the function to a linear function of some functions of x.

Pingback: Linest, NPV, IRR and solving polynomials | Newton Excel Bach, not (just) an Excel Blog

Hi all, multi-collinearity causes huge problems in polynomial regression if the range of x-values extends to values much larger than 1 in magnitude. For example, you might want to calculate the correlation coefficients between x, x^2, x^3 and so on if x = 20,21,22,23…,30; they are all close to 1. This can even cause fatal interpretation errors if the estimated confidence intervals are used for some kind of error propagation analysis without accounting for the large co-variances between the regression coefficients. The only way out of that trap is to use orthogonal (or even orthonormal) polynomials like those of Legendre as basis functions. As these are defined on the interval [-1;1], the variables have to be transformed before the regression takes place. And always mind the co-variances between the coefficients in an error propagation analysis, if you work with the coefficients for x, x^2…!

The usage of at least orthogonal polynomials is the only method that allows to reliably detect non-linear relationships far away from the origin, for example, when you want to do a non-linear regression of income on age (30 – 70) or so.

Georg – Good points, some of these were alluded to in comments from the follow-up

post. For computing the coefficients, LINEST/TREND can be applied to data centered around the mean and the results are in close agreement to other high-precision polynomial regression algorithms. The QR/SVD decomposition methods for calculating least squares estimates can be seen as finite dimensional analogs to orthogonal polynomial expansions of L^2-functions.

I would like to see how others graphically show the col-linearity – I have just posted on http://www.excelfox.com how to get a correlation map using colors. The post is under the heading

“Using property ColorScaleCriteria color you cells” (in the download center) – what it shows is a correlation matrix of 3 wet chemistry assays (Y variables) and Absorbencies as the X data — so these are adjacent frequencies (X values) with a very high covariance.

Anybody who has examples of how to graphically show a map similar to what I just posted would be appreciated.

PS – I had to cut out data (max file size is 100kb) – so if you plot the X-values you will see ‘noise’ in the spectra.

Rasm – I couldn’t log in to view your file, but for a visual plot of collinearity in three regressor variables I would plot one variable against a best fit linear combination of the other two. It’s insightful to do this for Doug’s polynomial example mentioned above.

Starting from a blank workbook, here’s a few steps to set up the chart and plot the data, no datasheets or code modules are required. You can just press Alt+F11 and enter sequentially in the immediate window, the corresponding UI commands should be self-evident.

charts.add

activechart.seriescollection.newseries

set s=activechart.SeriesCollection(1)

s.Type=xlXYScatter

names.add "x", [4.99+row(sheet1!1:101)/100]

names.add "y", "=x^3"

names.add "z", "=trend(x^3,x^{1,2})"

s.formula = "=series(,sheet1!y,sheet1!z,1)"

s.trendlines.add DisplayRSquared:=True

s.trendlines(1).datalabel.numberformat="0.000000000000000"

This gives a near exact straight line with R^2=0.99999946. Choosing Debug>Add Watch with [linest(x^3,x^{1,2})] also gives the same value in the (3,1) element. Extending to [linest(x^6,x^{1,2,3,4,5})] gives R^2=1 exactly to 15dp, so x^6 coefficeint is dropped from the calculation of coefficients as it adds no more information.

To reduce the collinearity the first step is to center around the mean by changing 4.99 to -0.51 above this gives R^2=0.8401098. The second step is to transform the columns so they are uncorrelated, this is what Excel and other least squares methods do “behind the scenes”. Choosing the cubic Legendre polynomial in place of x^3 as below gives R^2=0.0019898.

names.add "x", [(x-average(x))*2]

names.add "y", "=2.5*x^3-1.5*x"

Lori

I will try your method – but I have several 100s X values – up 1050.

I do preprocess the data i.e. SNV and typically a 1st derivative (with a smooth and a gap) – next I mean center the data. I typically find the best model using PLS or MLR. The data I work with are spectra – that is why I have extreme col-linearity – the dependent variable is typical a concentration. But I do find the approach described in this threat very interesting. Again thanks for your reply – I will try your method – may give me some inspiration.

In order to compare the extent of collinearity of two vectors V1 and V2 to our everyday experience of Euclidian sapce, it might help to calculate the angle A12 between them according to the formula

A12=180/PI()*ARCCOS(SUMPRODUCT(V1,V2)/SQRT(SUMPRODUCT(V1,V1)*SUMPRODUCT(V2,V2)))

A result of 0 means totally collinear and 90 means totally independent.

Hi,

I want to find out the trendline for a set of data (x,Y) but I want to get the trendline in the form of sin(x) and cos(x). However, using the excel it is only possible to have it in the form of linear, power, .. but not the sin or cos(x). How you think I can solve it?

Well asafa, if you really meant A*sin(x)+B*cos(x), you could use linest because your model is linear in the parameters. But I guess you want to solve something like A*cos(B*x)+C*cos(B*x) in order to determine a spectral component. As this model is non-linear in the parameters, you have to use a non-linear least squares method, for example, Excel’s solver. My experience of 20 years of NLLSQ fits lets me strongly recommend to use VBA to interface to an external DLL that allows for suppying the derivatives of the model function with respect to the parameters analytically. Doug has some posts on how to interface to Fortran. The procedure for interfacing to C is quite similar. There are free NLLSQ routines available for download on the web (NetLib, AlgLib,…). You could use the free CAS Maxima, e.g., in order to determine the analytical derivatives if your models become more complicated than just being the sum of a sine and a cosine.

asafa and Georg – see:

http://newtonexcelbach.wordpress.com/2011/03/01/non-linear-regression-1/

and

http://newtonexcelbach.wordpress.com/2011/03/05/non-linear-regression-%e2%80%93-2-alglib-functions/

for post on using the Excel Solver and the Alglib routines for non-linear regression in Excel.

I agree with Georg that for anyone doing serious work on this the purpose written routines such as those from Alglib offer much better performance than using Solver.

Hi everbody, great site!!!. I am working in forcasting project, and its so hard to read in the chart the exactly number of the trend, I am looking for information, how to get the values of the linear, polynomial, exponential etc. lines but in values, for example my data is A1:A20 so I would like to see on B the linear values, C logarithmic, D polynomial etc. how can I do this, please any help welcome.

Baum – Have a look at the download spreadsheet. It contains the examples shown in the screenshots which returns the data you want. If anything isn’t clear, please ask.

anyone knows any video link to see exactly how to get the coefficients trend lines.

Hi,

My name is Zack. I was wondering if there was any way to program an equation with a missing variable in excel, and have it calculate for the missing variable.

I.E.) X=Vo*t + 1/2*a*t^2 when knowing what x, Vo, and a are equal to, to try and find what “t” is.

or I.E.) V=Vo + a*t knowing Vo, a, and t to find “V”

If anyone could tell me how to do this with several other equations, I would greatly appreciate it.

my email is “zackgane@yahoo.com” thank you

Zack – for your first example you can use the Excel Goal-seek function (under the Data tab in 2007/2010). Also have a look at the Solver which gives more control and can solve more complex problems (more than one unknown for instance). There are also some posts here with UDFs to solve polynomial equations, which will do the job more quickly and conveniently than Goal-seek if you have a lot of them:

http://newtonexcelbach.wordpress.com/2010/08/04/solving-cubic-and-quartic-equations-with-excel/

http://newtonexcelbach.wordpress.com/2011/01/13/solving-higher-order-polynomials/

For the second example you can just enter the formula in a cell to find V, but maybe you meant to find t? In that case you can use the same methods as for the first example, but some simple algebra will give the result: t = (V-Vo)/a. There is a formula solution to the first example as well; look up quadratic formula.

Pingback: 4 Year Report | Newton Excel Bach, not (just) an Excel Blog

Hi

This is shiva

I have some points that should fit in a bell curve..

What i do to get the ordinates for any point.

Email. shiv_yers@hotmail.com

hi, i’m a final year civil engineering student and i’m doing a spreadsheet for the design of steel components using microsoft excel.. does anyone know how to generate the bending moment and shear force diagrams in excel..? thanks :)

Hawwa – have you tried the Internet?

There are a stack of programs that will generate bending moments and shear forces, with different levels of complexity.

If you are looking for a continuous beam analysis you could start here:

http://newtonexcelbach.wordpress.com/2012/05/03/using-udfs-continuous-beam-example-3/

Pingback: Daily Download 14: Curve Fitting 1 | Newton Excel Bach, not (just) an Excel Blog

Noob question and perhaps attributed to a superficialy understanding of arrays in excel (amongst other things) but for the life of me, I can’t understand why in the logarithmic example, the forumlas in cells O55 and P55 return different values but appear to be exactly the same. What subtle point am I missing?

Scott – yes it is the use of array formulas that is confusing if you are not used to them.

The two cells contain the same formula, but it returns an array rather than a single cell, in this case the array is 1 row x 2 columns. The values are a and b in the formula

a(ln(x)) + b. You can see that the values match those of the trend line in the chart starting at C50.

To return both results of the array:

- Enter the formula in O55 in the usual way.

- Select O55:P55

- Press F2

- Press Ctrl-shift-enter

You should now get both results, and the formula will display in the edit line with {} around it, the same in both cells: =LINEST(Y_3,LN(X_3)).

You can also select O55:P55 to start with and enter with ctrl-shift-enter.

More details at: http://newtonexcelbach.wordpress.com/2011/05/10/using-array-formulas/

Thanks for the quick reply :) Last question would then be whether its possible to actually have all of this in a single cell (i.e. a single formula that would return the value of y in a cell by calculating a(ln(x)) + b) or do I essentially have to calculate a and b in separate cells like in your file and then do another cell to calculate y?

Scott – it’s possible but the formula would get quite long. You can use the Index function to return any value from an array, the same as if it was a range, so:

=INDEX(LINEST(Y_3,LN(X_3)),2) will return the b value.

you would end up with something like:

=LINEST(Y_3,LN(X_3)) * LN(x) + INDEX(LINEST(Y_3,LN(X_3)),2)

But personally I’d rather return the results of the array formula in two cells, and use a third cell to get the desired result.

Thanks for the information. I want to ask one thing that in power series trend line, if the x axis is logarithmic will there be any effect on the values. Please reply.

The easiest way to check is to try it and see, but the answer is no, making the scale logarithmic only changes the way the graph is plotted, it doesn’t change the trend line results.

Your blog is very useful. As a new VBA user I’m having touble to extend this NL fit case a simple on variable case. Have tried bchanging both AL_NLFit and AL_NLFitText without success. I hit a wall that says : cannot change part of an array. I even tried tp create a new one with the insert function for yusing UDF but could not get it to work.

Can you please help me? I’m a curious student in BA and I want to learn how to this. Thanks in advamnce.

Have a look at:

http://newtonexcelbach.wordpress.com/2011/05/10/using-array-formulas/

If you still have problems after reading that you could send a sample file to dougaj4 at the usual google mail address.

I did it and it works!

Thank you very much!

Thanks for the information! I have a question, how do I calculate the b, c and d for the Cubic curve estimation example? When I use the formula =Linest(Y_1,X_1,^{1,2,3}) I get the number presented as a (-2,08199), but how do I calculate the numbers b, c and d?

Lenn – sorry for the delay in replying.

You have to enter the Linest function as an array function to display all the results.

If you already have the function entered and displaying the first result then:

-Select that cell and the three adjacent cells to the right.

-Press F2 to enter Edit mode.

-Press Ctrl-shift-enter

The four results should display in the selected cells.

Thanks for the explanation! That makes sense!

Thank you for the file.