Trendline coefficients (2024)

Recent Comments

From KSM on April 25, 2012

This page had helped me in understanding the regressions in Excel. Thanks a lot!

From M.K. on April 26, 2011

This page is an excellent blend of regression theory and application with Excel. It was just what I needed to solve my problem.Thanks!

Introduction

A trendline shows the trend in a data set and is typically associated with regression analysis. Creating a trendline and calculating its coefficients allows for the quantitative analysis of the underlying data and the ability to both interpolate and extrapolate the data for forecast purposes. It is probably best to illustrate the problem with a simple example.

Consider monthly sales as shown in Table 1.

Month

Sales

1

3010

2

4500

4

4400

6

5400

7

7295

8

8195

Table 1

Data for two months are missing but in general, it would appear that there is an upward trend with a growth in sales as time goes by. However, beyond that what can we conclude? Can we estimate the sales for month 9? Or, can we tell the rate at which sales are increasing? Not really. However, with a systematic analysis of the data, we can get a lot more information.

A good start would be to plot the data as shown in Figure 1. Now, the upward trend is a lot more obvious.

Trendline coefficients (1)

Figure 1

With a trendline we can quantify that observation. To add a trendline, select the plotted series, and then Chart | Add Trendline… | Type tab | select Linear and from the Options tab | check ‘Display equation on chart’

Trendline coefficients (2)

Figure 2

In addition to the trendline, Figure 2 also contains horizontal line showing the average of the sales data. From the trendline, we can conclude that on average monthly sales increase by 650 units each month. But, that is about the limit of the information. It is important to note that extrapolating too far out beyond the existing data set is not a very smart thing to do. It would be OK to estimate sales in each of the missing months: 3 and 5. Estimating for missing data within the overall data set is known as interpolation. By contrast, estimating sales for months outside the data set is known as extrapolation. It might be OK to make an estimate for “nearby” months such as month 9 or maybe even 10. However, it would be foolhardy to conclude that a year from now sales will have risen by 650*12 or 7,800 units. The limit of what one can do with a regression is further addressed in the section on Confidence Intervals.

While the trendline is of some value, if we were to use the information in any analysis, it would help to have the data in worksheet cells and not as text in a chart. In addition, consider the case in which the dependent variable (sales in the above example) was a function of not one independent variable (month in the above example) but multiple variables? Known as a multiple regression, a graphical analysis would require what cannot be done in Excel: an n‑dimensional graph. The section on Linear regression with multiple variables addresses how this can be done in an Excel worksheet.

Excel includes multiple functions for regression analysis. We will look at LINEST in detail. Once we understand the LINEST function, LOGEST will not be too difficult to understand. In addition to using LINEST for linear estimates, we will also see how its name is misleading. It can actually be used for a large variety of different types of trends such as polynomial, logarithmic and exponential.

The previous chart showed a linear trend with one independent variable. What other kinds of trends are there? Figure 3 shows five different types: polynomial of order 3, linear, logarithmic, natural exponential, and power.

Trendline coefficients (3)

Figure 3 – Note that what Excel calls an exponential trendline is strictly speaking a natural exponential trendline of the form Trendline coefficients (4). Excel doesn’t have the capability of drawing the more general exponential trendline of the form Trendline coefficients (5)

Other types of trendlines would include the trigonometric functions Trendline coefficients (6)or Trendline coefficients (7), as well as the exponential function, Trendline coefficients (8). While Excel includes a special function to analyze the exponential regression – the LOGEST function – we will see why, strictly speaking, it is not necessary.

The rest of the case study is structured as follows:

  • The basic linear regression
  • Understanding the result
  • Limits of regression analysis
  • Over-specifying a regression – a common mistake
  • Confidence Intervals
  • Regression analysis with other trend functions
  • Linear regression with multiple variables
  • Regression with polynomials
  • Regression with indicator (dummy) variables
  • Comparing chart trendline values with LINEST results
  • Implications for different versions of Excel
  • LINEST and VBA
  • Summary

The basic linear regression

A linear regression fits the line Trendline coefficients (9), or as Excel prefers to call it Trendline coefficients (10), to the existing data set. It does so through a technique known as minimizing the sum of the squares of the error terms. To get the complete result of a regression analysis, select a range 5 rows by 2 columns and array-enter the LINEST function as shown in Figure 4. The first row contains the 2 coefficients a1and a0 respectively.

Trendline coefficients (11)

Figure 4

Using regression analysis yields a lot more information than the trendline’s coefficients. The rest of the information is important in understanding how well the regression line fits the data, how significant the individual coefficients are, as well as the significance of the regression as a whole. It also contains key elements needed to build confidence intervals for interpolated or extrapolated estimates, a subject covered in the section titled Confidence Intervals.

But, first, we start with some nomenclature. The number of data points is given by n. The number of independent variables is given by k. If a constant is included in the regression, it increases k by 1. Figure 5 is a zoomed-in version of Figure 2 focusing on the data point (4,4400). Each of the recorded observations is denoted by the pair of valuesTrendline coefficients (12). For each Trendline coefficients (13), the value predicted by the regression is given by Trendline coefficients (14). Trendline coefficients (15)is pronounced “y hat.” The average of all the observed y values is known as Trendline coefficients (16), pronounced “y bar.”

Trendline coefficients (17)

Figure 5 – Enlarged view of Figure 2 around the data point (4, 4400)

Lacking any information about the regression one could still estimate the average of all the y’s, i.e., Trendline coefficients (18). That’s a “first cut” estimate that we seek to improve with a regression. The value that one gets from the regression is Trendline coefficients (19). As Figure 5 shows, the total gap between the observed data point and the “first cut” estimate is given by the deviation Trendline coefficients (20)- Trendline coefficients (21). Out of this the regression accounts for the amount Trendline coefficients (22)Trendline coefficients (23). This leaves a residual (also known as an error) of Trendline coefficients (24)Trendline coefficients (25)unexplained by the regression.

With the nomenclature established, we look at the result of the LINEST function. For reasons that will soon be apparent, we start with the last row. There are two values Trendline coefficients (26)and Trendline coefficients (27). These are aggregate measures of something we have already looked at the level of an individual data point. Recall that for each individual data point, the measure of how much the regression explains is Trendline coefficients (28)and how much remains unexplained is Trendline coefficients (29). Trendline coefficients (30)and Trendline coefficients (31)are aggregate values of the same metrics. The first is the sum of the squared values of how well the regression fits the data or Trendline coefficients (32). The second is the sum of the squared values of how much remains unexplained or Trendline coefficients (33).

Row 4 contains two values: the F statistic and the degrees of freedom, df. The degrees of freedom is given by the expression n-k, where n and k are explained earlier in this section. The F statistic, or the observed F-value, is a measure of the significance of the regression as a whole. For the technically minded it tests the null hypothesis that all of the coefficients are insignificant against the alternative hypothesis that at least one of the coefficients is significant. While Excel provides the value, it can also be computed as Trendline coefficients (34). This, the observed F-value, is then compared against a critical F-value, F(a, v1, v2), where ais 1 - the level of significance we are interested in, and v1 and v2are as calculated below. The typical level of significance one is interested in is 95%. Hence, a = 1-0.95 or 0.05. If the constant term in included in the regression (the LINEST parameter const is either TRUE or omitted), v1 = n – df – 1 and v2 = df. If the constant term is excluded from the regression (the LINEST parameter const = FALSE), then v1 = n – df and v2 = df.). Excel’s FINV function, used as FINV(F, v1, v2), provides the critical F-value. If the observed F-value is greater than the critical F-value, it means the regression as a whole is significant.

Row 3 contains the two metrics, R2 and theSEreg. The R2 is measure of how well the regression fits the observed data. It ranges from 0 to 1 and the closer to 1 the better the fit. Mathematically, it is calculated as Trendline coefficients (35), where each term is explained above. Graphically, in terms of Figure 5, it is a measure of how close the regression line is to all of the observations. Suppose the regression line were to pass through every observation. Then, SSresidual would be zero since each of its components Trendline coefficients (36)would be zero, and R2would become SSregression/SSregression or 1. The second item in this row is the Standard Error of the regression, or SEreg. It can also be calculated from what we already know, i.e., Trendline coefficients (37). Keep in mind that n-k is also the dfvalue in row 4 of the result. SSreg will play a role in calculating the confidence intervals later in this chapter.

Row 2 provides the standard error of each coefficient, or Trendline coefficients (38). The section Understanding the result addresses how these errors help determine if the coefficients are significant.

Understanding the result

LINEST can return more than just the coefficients of the regression. Used as an array formula in a 5 rows by X columns range, LINEST returns not only the coefficients but also other statistical information about the results. Some might find it surprising but the Excel documentation for LINEST does a very good functional job of explaining not only contents of all the rows but also the statistical value of that information. Nonetheless, we will look at one key element of the result that bears repeating – it is overlooked by way too many users of LINEST. Figure 6 shows the result of some linear regression[1] using LINEST.

Trendline coefficients (39)

Figure 6

When array-entered in 5 rows (D2:E5), LINEST, as before, returns the coefficients in row 1. In row 2 it provides the standard errorof each of the coefficients. Taken together, the two rows contain critical information in estimating whether each of the coefficients is statistically different from zero. Dividing the absolute value of the coefficient value by the standard error yields what is known as the observed t-result, or Trendline coefficients (40)Comparing the absolute value of this t-result with the corresponding critical t-value lets one decide whether that coefficient should be treated as zero. If the observed t-result is lessthan the critical t-value then statistically the coefficient is the same as zero. In the above example, the t-result for the a1 and the a0 (constant) terms are: Trendline coefficients (41), respectively. The critical t-value is calculated with the formula =TINV(0.05, df.), where 0.05 corresponds to the 95% confidence level and d.f. is the degrees-of-freedom for the regression. In the above example, d.f. is given by E4. The result of the formula, i.e., the critical t-value, is Trendline coefficients (42). The absolute value of the observed t-result for the coefficient (a1) is much greater than the critical t-value. Hence, we can conclude that the a1term is statistically significant. On the other hand, the absolute value of the observed t-value for the a0 (constant) term is 2.148. Since it is less than the critical t-value, we conclude that, statistically speaking, the constant is zero.

Since the constant term is statistically indistinguishable from zero, one should run the regression with the constant term forced to zero. This yields the result in Figure 7.

Trendline coefficients (43)

Figure 7

Note that the R2 value has actually increased from 99.65% to 99.88%! Hence, removing the value that was statistically indistinguishable from zero has improved the R2 value. Of course, this may not always be true, but removing a term that is statistically the same as zero is always a good idea.

Limits of regression analysis

Regression analysis is not magic. It is a tool that is only as good as the underlying data used in the analysis and as good as the person using it. Ultimately, there is no substitute for a better understanding of the underlying cause-and-effect that the regression is meant to analyze as well as a better understanding of the real world elements being modeled in the regression.

A question that arises on occasion is “what kind of trendline should I pick?” The answer is that the analyst has to have some understanding of the process under analysis. For example, in the example used in the beginning of this chapter, we modeled the sales volume as a linear function of the month. While it made sense for the range of vaues we had, if we were working over a longer period, it is quite possible that a different model, such as a logarithmic regression, might have been more appropriate.

A few other examples that come to mind will demonstrate the importance of paying attention to the underlying issues that affect the analysis:

In one case, someone had 5 years of data for the number of applications and the number of admissions to a college. Based on this data, her boss had asked her to project the number of applications and admissions for the next 5 years. Extrapolating five years of data this far out into the future is simply dangerous. Any number of factors can affect application and admission rates to a college and ignoring the various possible causes is a prescription to a disaster.

In a consulting assignment, I was asked by a regional utility to help analyze the pattern of calls to its call center. As one can imagine any number of variables go into such calls. Among others, there were seasonal trends, annual trends, day-of-week patterns, storm-related surges in call volume, etc. One of the challenges was to better understand call patterns through the week. The obvious and intuitive thing to do was establish a ‘day of week’ indicator. However, that did not yield statistically significant results. After talking with the call center management and the call center operators, I realized that the key was not the day-of-the-week but ‘operating day of week’ – a term that accounted for holidays. What that meant was that if Monday was a working day, it would be operating-day-1. However, if it was a holiday, then for that week Tuesday would be operating-day-1. Once we factored in this kind of reasoning, together with a similar flag for ‘last-working-day-of-the-week’ the independent variables immediately became statistically significant!

The final example is a classic from an introductory MBA class on statistics. Suppose one were to do either of the following two regressions: (1) the number of men buying ice cream each day as a function of the number of women wearing bikinis that day, or (2) the reverse, i.e., the number of women buying ice cream each day as a function of the number of men going shirtless that day.[2] The result will show a significant statistical correlation. But does it make sense? No, not really. The events are correlated, but neither is the cause of the other. In fact, the true cause is missing from the analysis – the average daily temperature! In summer, when the outside temperature is high, both men and women buy more ice cream and wear fewer clothes. When the temperature drops during the winter months, people wear more clothes and buy less ice cream.

These examples should highlight the risks of carrying out a regression analysis without truly understanding the underlying factors that affect the analysis. If one is lucky the results will be so absurd that they will be rejected out-of-hand. If one is unlucky, the result may lead to bad organizational or individual decisions.

Research this issue further by searching Google or by visiting Gary Klass’s
“Interpreting the Numbers. (under construction)” at http://lilt.ilstu.edu/jpda/interpreting/interpreting_the_numbers.htmor Gerald Bracey’s “THOSE MISLEADING SAT AND NAEP TRENDS: SIMPSON'S PARADOX AT WORK” at http://www.america-tomorrow.com/bracey/EDDRA/EDDRA30.htm

Over-specifying a regression

Given how easy it is to add a polynomial trendline to an Excel chart, it is tempting to always ask for a high order polynomial as the curve of best fit. The number of requests one sees in the Excel newsgroups where someone asks for a cubic polynomial or even a sixth-order polynomial is quite high. However, it is usually not a good practice to over-specify a regression since it can lead to results that are numerically unstable. This over-specification is easy to identify if one uses LINEST to find the coefficients of a polynomial best-fit and then tests if the coefficients are statistically significant.

Suppose one carried out a quadratic polynomial regression[3] for the data from the previous section. The result of the LINEST function would be:

Trendline coefficients (44)

Figure 8

The absolute value of the observed t-valuecorresponding to the x2 term is 1.355. The t-critical valueis 2.1788. Hence, statistically speaking, the coefficient of the x2term is indistinguishable from zero. What this tells us is that asking for a quadratic polynomial best-fit over-specifies the regression. Removing the x2term leads to the result of Figure 6, which, in turn, leads to the result of Figure 7.

Confidence Intervals

Among the uses of a regression is the ability to estimate missing information as well as information outside of the range of data analyzed. In the example from the first section, one might want to estimate the sales volume missing in months 3 and 5. Or, one might want to estimate sales in months 9 or 10, or even further out into the future. Note the emphasis on “estimate.” Clearly, one cannot state with certainty that the sales volume in month 3 must have been a particular amount. Similarly, it is hard to imagine a certain prediction about sales volume a year from now. However, we can do two things. First, provide what is called a point estimate. Second, provide a range of within which we can state with some certainty the actual value will fall. This information is often shown graphically as in Figure 9. The lower and upper curved lines demarcate the range of the 95% interval whereas the regression line itself represents the point estimate. Hence, for month three the best point estimate would be 4,383 units. For most practical purposes there is a 95% probability that the actual value was between the upper and lower CIs. For month 3 those values would be 3,363 and 5,404 respectively.

Trendline coefficients (45)

Figure 9

The calculations required to plot the three lines are described below. The predicted line is the same as the regression line and each y value, yj, is calculated as described in The basic linear regression, i.e., Trendline coefficients (46). The distance of each confidence interval point from the regression line is given by the equation Trendline coefficients (47), where CIj is the value for the value of interest xj and xi represents the known observations. SEreg is one of the values returned by the LINEST function and explained in The basic linear regression section. Excel provides a function, DEVSQ, to compute the sum of the squares in the equation.

Next, we compute these values in Excel. The original data set is shown in Figure 10. Only four of the regression related values are shown – a1, a0, SEreg, and the two-side 95% t(df) value.

Trendline coefficients (48)

Figure 10

Next, we calculate the predicted value and the confidence interval values for the months 1 through 20.

Trendline coefficients (49)

Figure 11

H3:K3 contain the formulas shown below.

H3

=$E$3*G3+$E$4

I3

=$E$6*$E$5*SQRT(1/COUNT($A$3:$A$8)+(G3-AVERAGE($A$3:$A$8))^2/DEVSQ($A$3:$A$8))

J3

=H3-I3

K3

=H3+I3

Copy cells H3:K3 rows 4:22. Graph H, J, K as the y values and G as the x-values in a XY Scatter chart to get

Trendline coefficients (50)

Figure 12

Note how much additional uncertainty is introduced the further we get from the original data set, which had values only for the first 9 months.

Regression analysis with other trend functions

As shown in Figure 3, there are many different types of trendlines possible. Each reflects a different relationship between the independent and the dependent variables. Some trend functions of a single variable – other than a linear or a polynomial trend – are listed in the table below.

Logarithmic

Trendline coefficients (51)

Power

Trendline coefficients (52)

Exponential function to base b

Trendline coefficients (53)

Natural Exponential function

Trendline coefficients (54)

Trigonometric function

Trendline coefficients (55)

Table 2

The key to understanding this section (as well as the one on polynomial trends) is realizing something that apparently the Microsoft people who documented LINEST overlooked: the linearity required in sum-of-square best fit regression analysis applies not to the variables but to the unknown coefficients! Hence, as long as we can write a function in a form that is linear in the unknown coefficients, LINEST can be used to analyze the data. Keep this in mind as we look at how to transform the above functions into a form amenable to analysis by LINEST. Also, the data for the examples for the different trendlines are in Figure 13. This is the same data set used for the trendlines in Figure 3. Hence, for those functions for which Excel can create chart trendlines, the reader can compare the results below with the chart trendline results.

Trendline coefficients (56)

Figure 13

We start with the logarithmic trendline function,Trendline coefficients (57). It is already in a linear form (Trendline coefficients (58)). Hence, given the x and y values, =LINEST(y-range, LN(x-range)) will give the required a (slope) and b(intercept) values.

Trendline coefficients (59)

Figure 14

The power function Trendline coefficients (60)can be transformed into Trendline coefficients (61)[4]. As in the log trendline, given x and y values, using LINEST with this transformed function means that =LINEST(LN(y-range), LN(x-range)) yields ln(a) as the intercept and b as the slope. Hence, for the original regression awill be EXP(intercept) and b will be the slope itself.

Trendline coefficients (62)

Figure 15

The exponential function to base b, Trendline coefficients (63), can be transformed into Trendline coefficients (64). Using LINEST with this transformed function means that =LINEST(LN(y-range), x) yields ln(a) as the intercept and ln(b) as the slope. Hence, for the original regression a=EXP(intercept) and b=EXP(slope).

Trendline coefficients (65)

Figure 16

Finally, the natural exponential function Trendline coefficients (66)can be transformed into Trendline coefficients (67). Using LINEST with this transformed function means that =LINEST(LN(y-range), x) yields ln(a) as the intercept and bas the slope. Hence, for the original regression a=EXP(intercept) and b=slope.

Trendline coefficients (68)

Figure 17

Linear regression with multiple variables

We can easily extend the above case with its single independent variable to include multiple independent variables. When the dependent variable is a function of multiple independent variables the problem is called multiple regression[5]. In mathematical terms, Trendline coefficients (69), where each xrepresents an independent variable and the various m’s are the coefficients from the regression (m0 is the regression constant). One example would be the assessed value of a commercial building as a function of floor space, number of offices, number of entrances, and age. This is from the Excel help file and is duplicated here to provide a sense of continuity between the help and this document. In addition, it serves as a stepping stone for the next section, which is a type of regression that is frequently requested.

In the commercial building example, the y would be the value of the building and the x’s are the respective independent variables as shown in Table 3.

Variable

Refers to the

y

Assessed value of the office building

x1

Floor space in square feet

x2

Number of offices

x3

Number of entrances

x4

Age of the office building in years

Table 3

Suppose the value of the commercial building and the corresponding values of the independent variables are known (see Figure 18).

Trendline coefficients (70)

Figure 18

Hence, the regression equation would be Trendline coefficients (71). In this case, the regression result of LINEST is shown in Figure 19

Trendline coefficients (72)

Figure 19

The number of columns is one more than the number of independent variables, the extra column for the regression constant. Left to right the coefficients are m4 to m0. Hence, -234.2372 is the coefficient of x4, the age of the office building and 27.64139 is the coefficient of x1, the floor space.

For an explanation of the various rows see the earlier section on ‘Understanding the result.’

Returning to a recurring theme, does the result make sense? What is the sign of each of the coefficients? Do you agree with the sign? The coefficients for the floor space, the number of offices, and the number of entrances are all positive (and based on the observed t-value statistically significant). That means that more entrances (or more floor space or more offices) means a higher value for the building. The age of the building, on the other hand, has a negative (and significant) coefficient. That means that the larger the age, i.e., the older the building, the lower its value. That makes sense, doesn’t it?

Regression with polynomials

In a regression analysis with polynomials such as Trendline coefficients (73)each term is like a different variable, i.e., Trendline coefficients (74). Written in this form it becomes apparent that a polynomial regression is no different from a multiple regression. One could create three columns of data: one with the x values, the next with the x2 values, and the last with the x3 values and then use the same LINEST syntax as for a multiple regression. However, the mechanics can be simplified thanks to Excel’s array formula capability. For example, the solution to a third order (cubic) polynomial regression would be the array formula =LINEST(y-range, x-range^{1,2,3}). The last expression takes the x-range and converts it into three separate vectors, Trendline coefficients (75), Trendline coefficients (76), and Trendline coefficients (77), that are then used for the multiple regression. Using the data of Figure 13, the result of the LINEST function is in Figure 20. Again, one can compare the result of the LINEST function with the chart trendline of Figure 3.

Trendline coefficients (78)

Figure 20

Regression with indicator (dummy) variables

There are many instances when an independent variable does not contain numeric values but rather categorical values. A simple example is the variable Gender, which can contain only one of two values Mor F. In such a case, one represents each categorical valuewith an Indicator (or Dummy) variable that is either 1 or 0 depending on the category value. For example, we could introduce a variable M that would be 1 if Gender equals M. Otherwise, it would be zero.

While it might appear that one should also have another variable that is 1 when Gender equals F, it is not really required and should not be created. After all, M=0 implies F=1, just as M=1 implies F=0. As a rule, the number of indicator variables will be one less than the number of values that the category variable can take.

Another example: imagine an independent variable, Season, that could equal Spring, Summer, Fall, or Winter. We would need only three Indicator variables, say, Spring, Summer, and Fall. Winter would be implied by all three indicator variables being zero.

Sometimes, what might appear to be a numeric variable is actually a categorical variable. Most businesses break up the year into four quarters, typically numbered 1, 2, 3, and 4. While this might make it look like the variable is numeric it is actually a category variable. After all, the quarters could just as easily be labeled A, B, C, and D. Essentially, one must be careful to identify and treat a categorical variable correctly.

This issue of correctly identifying a category variable shows up in the next example. It consists of a made-up data set from someone asking for help. Given the data in Figure 21, we have to forecast the sales in each of the quarters in 2004.

Trendline coefficients (79)

Figure 21

One instinctive approach might be to carry out a simple regression treating the Sales as dependent on the Period. However, that would be a mistake. Before doing anything, as noted at the start of this chapter, a very good idea is to always plot the data. Suppose we lay out the data slightly differently as in Figure 22and create a Line chart as in Figure 23.

Trendline coefficients (80)

Figure 22

Trendline coefficients (81)

Figure 23

There are two patterns that stand out. First, there is a slight increase in sales year-over-year. Second, there is a pattern within each year. The sales start off low in Q1 of a given year, increase slightly in Q2, rise substantially in Q3, and drop a fair bit in Q4, falling slightly further in Q1 of the next year.

The two trends should alert us that we cannot use a single variable Period as the independent variable. The results would be totally incorrect. What we need to do is introduce Indicator variables. Since the Quarter variable can take 1 of 4 values, we need three Indicator variables. Let’s call them Q1, Q2, and Q3. Remember that Q4 is implied by all of Q1, Q2, and Q3 being zero. The resulting table is shown in Figure 24.

Trendline coefficients (82)

Figure 24

Now, select a 5x5 range, say, G3:K7, and enter the array formula =LINEST(E2:E13,A2:D13,TRUE,TRUE) to get the result:

Trendline coefficients (83)

If we use these coefficients to forecast the sales for each quarter of 2004, we would get

Trendline coefficients (84)

To get these values, add the data in columns A:D. Then, in E14 enter the formula =$G$3*D14+$H$3*C14+$I$3*B14+$J$3*A14+$K$3 and copy E14 down to E15:E17. If we plot the resulting data, we get Figure 25

Trendline coefficients (85)

Figure 25

As expected, the forecast for 2004 fits in very well with both the year-over-year and the quarter-by-quarter patterns.

Comparing chart trendline values with LINEST results

For reasons best know to itself, Microsoft opted to implement two different regression algorithms, one for the chart trendline and another for the LINEST function. While the two yield the same result for most data sets, in those few cases where they disagree the chart trendline returns a more accurate result. This is especially true for versions of Excel prior to 2003.

I enhanced code originally posted by David Braden to retrieve the coefficients from a chart into worksheet cells. It has several limitations that are documented within the code itself. The google.com archive is at http://groups.google.com/group/microsoft.public.excel.charting/msg/0eda30f29434786d?hl=en&and is reproduced in Appendix A.

.

Implications for different versions of Excel

With Excel 2003, Microsoft changed the algorithms used in several statistical functions. One of the improved functions is LINEST. Among the improvements is an algorithm with improved numerical stability. In addition, the software does a better job of detecting collinearity.

LINEST and VBA

The power of LINEST is not restricted to Excel. Conceptually, it is available in VBA just as it is in Excel: specify a column vector of y-values and a matrix of x-values together with the optional arguments. For a single variable analysis, the matrix of x-values will be a single column vector. For a multiple regression, the matrix will have multiple columns, one for each independent variable. This is exactly the same setup as in an Excel worksheet. The VBA syntax for a column vector with m elements is Dim aVector (1 to m, 1 to 1) and a matrix with m rows and n columns is Dim aMatrix (1 to m, 1 to n). The result of LINEST will be an 2 dimensional matrix, just as in Excel. A key difference between Excel and VBA is that the latter doesn’t support array formulas. Effectively, in VBA one must forgo the simplicity of an array formula and explicitly set up the matrix necessary for the regression model. For example, to carry out a cubic polynomial regression in VBA we would have create a matrix with three columns rather than use the Excel abbreviation of x-range^{1,2,3}.

In the code below, PolyTrends expects YVals and XVals to be column vectors. Since the subroutine calculates the trend of a polynomial regression, the single column of x-values must be converted into a multi-column matrix. Once the single column of x-values is converted into the matrix with the necessary columns, the PolyTrends function returns as its result whatever is returned by the Linest function. As with the other code examples, functional decomposition and modularity results in self-documented code. As such, the task of converting the x-value column vector into a multi-column matrix is delegated to the XArray function.

Function PolyTrend(YVals, XVals, PolyPower As Integer)

Dim XArr()

XArr = XArray(XVals, PolyPower)

PolyTrend = Application.WorksheetFunction.LinEst(YVals, XArr, True, True)

End Function

The XArray function is shown below. It takes the single column of x values and returns a multi-column matrix. The first column contains values of x1, the next column x2, etc.

Function XArray(InArr, PolyPower)

'The returned value is a matrix that has the same number of rows as _

InArr and as many columns as are needed to contain the powers of x _

specified in PolyPower. _

Since the lower bound and the upper bound of InArr are unknown, the _

use of the LBound() and UBound() functions makes the code much more _

robust

Dim XArr()

Dim I As Long, J As Long

ReDim XArr(LBound(InArr) To UBound(InArr), 1 To PolyPower)

For I = LBound(XArr, 1) To UBound(XArr, 1)

XArr(I, 1) = InArr(I, LBound(InArr, 2))

For J = 1 To PolyPower

XArr(I, J) = XArr(I, 1) ^ J

Next J

Next I

XArray = XArr

End Function

Finally, the PolyTrend function can be used from another routine as shown below. For the sake of simplicity, the testPolyTrend routine gets the x and y values from an Excel worksheet. However, there is no such requirement. The data could just as easily come from some other file such as a text file output by another program. The printResults routine prints the result returned by the PolyTrend function.

Sub testPolyTrend()

Dim Rslt As Variant

Rslt = PolyTrend(Range("G2:G16").Value, Range("A2:A16").Value, 3)

printResults Rslt

End Sub

For the sake of completeness, the printResults routine it is shown below. All it does is write the contents of the Rslt variable to the Visual Basic Editor (VBE) Immediate window, one row at a time.

Sub printResults(Rslt)

Dim I As Integer, J As Integer, sMsg As String

For I = LBound(Rslt, 1) To UBound(Rslt, 1)

sMsg = ""

For J = LBound(Rslt, 2) To UBound(Rslt, 2)

sMsg = sMsg & IIf(IsError(Rslt(I, J)), "#N/A", Rslt(I, J)) & ", "

Next J

Debug.Print Left(sMsg, Len(sMsg) - 2)

Next I

End Sub

Summary

Acknowledgements

This document has benefited from comments from various people including but not limited to Jerry Lewis; David Haiser; and Werner Prystav of the Institute of Agricultural Engineering (ATB) in Potsdam, Germany

References

Introduction to Simple Linear Regression, Gerard E. Dallal, 2000 http://www.tufts.edu/~gdallal/slr.htm

EXCEL: Multiple Regression, A. Colin Cameron, Dept. of Economics, Univ. of Calif. – Davis, 1999, http://cameron.econ.ucdavis.edu/excel/exmreg.html

Microsoft Excel 2000 and 2003 Faults, Problems, Workarounds and Fixes, David Haiser, 2006, http://www.daheiser.info/excel/frontpage.html

An Introduction to Statistics, David Stephenson, 2000, http://web.gfi.uib.no/~ngbnk/kurs/notes/node74.html
Regression analysis, Wikipedia, http://en.wikipedia.org/wiki/Regression_analysis

Appendix A: Code to retrieve trendline coefficients

The code below implements functions (UDFs) to retrieve trendline coefficients from a chart.

Option Explicit

Option Base 0

'Function TLcoef(...) returns Trendline coefficients

'Function TLeval(x, ...) evaluates the current trendline at a given x

'

'The arguments of TLcoef, and the last 4 of TLeval: _

vSheet is the name/number of the sheet containing the chart. _

Use of the name (as in the Sheet's tab) is recommended _

vCht is the name/number of the chart. To see this, deselect _

the chart, then shift-click it; its name will appear in the _

drop-down list at the left of formula bar. In the case of a _

chart in its own chartsheet, specify this as zero or the zero _

length string "" _

VSeries is a series name/number, and vTL is the series' trendline _

number. If the series has a name, it is probably better to _

specify the name. To determine the name/number, as well as _

the trendline number needed for vTL, pass the mouse arrow _

over the trendline. Of course, if there is only one series in _

the chart, you can set vSeries = 1, but beware if you add _

more series to the chart.

'First draft written 2003 March 1 by D J Braden _

Revisions by Tushar Mehta (www.tushar-mehta.com) 2005 Jun 19: _

Various documentation changes _

vCht is now 'optional' _

Correctly handles cases where a term is missing -- e.g., _

y = 2x3 + 3x + 10 _

Correctly handles cases where a coefficient is not shown because _

it is the default value -- e.g., y = Ln(x)+10 _

When only the constant term is present, the original function _

returned it in the correct array element only for the _

polynomial and linear fits. Now, the function returns it in _

the correct array element for other types also. For example, _

for an exponential fit, y=10 will be returned as (10,0) _

Arrays are now base zero.

'Limitations: _

The coefficients are returned to precision *displayed* _

To get the most accurate values, format the trendline label _

to scientific notation with 14 decimal places. (Right-click _

the label to do this) _

Given how XL calculation engine works -- recalculates the _

worksheet first, then the chart(s) -- it is eminently _

possible for the chart to show one trendline and the _

function to return coefficients corresponding to the values _

shown by the chart *prior* to the recalculation. To see the _

effect of this '1 recalculation cycle lag' plot a series of _

random numbers. _

An alternative to the functions in this module is the LINEST _

worksheet function. Except for those few cases where LINEST _

returns incorrect results, it is the more robust function _

since it doesn't suffer from the '1 recalculation cycle' _

lag. With XL2003 LINEST may even return more accurate _

results than the trendline.

Function TLcoef(vSheet, vCht, vSeries, vTL)

'To get the coefficients of a chart on a chartsheet, specify vCht _

as zero or the zero length string ""

'Return coefficients of an Excel chart trendline. _

Limitations: See the documentation at the top of the module _

'Note: For a polynomial fit, it is possible the trendline doesn't _

report all the terms. So this function returns an array of _

length (1 + the order of the requested fit), *not* the number of _

values displayed. The last value in the returned array is the _

constant term; preceeding values correspond to higher-order x.

Dim o As Trendline

Application.Volatile

If ParamErr(TLcoef, vSheet, vCht, vSeries, vTL) Then Exit Function

On Error Resume Next

If vCht = "" Or vCht = 0 Then

If TypeOf Sheets(vSheet) Is Chart Then

Set o = Sheets(vSheet).SeriesCollection(vSeries) _

.Trendlines(vTL)

Else

TLcoef = "#Err: vCht can be omitted only if vSheet is a " _

& "chartsheet"

Exit Function '*****

End If

Else

Set o = Sheets(vSheet).ChartObjects(vCht).Chart. _

SeriesCollection(vSeries).Trendlines(vTL)

End If

On Error GoTo 0

If o Is Nothing Then

TLcoef = "#Err: No trendline matches the specified parameters"

Else

TLcoef = ExtractCoef(o)

End If

End Function

Function TLeval(vX, vSheet, vCht, vSeries, vTL)

'DJ Braden

'Exp/logs are done for cases xlPower and xlExponential to _

allow for greater range of arguments.

Dim o As Trendline, vRet

Application.Volatile

If ParamErr(TLeval, vSheet, vCht, vSeries, vTL) Then Exit Function

On Error Resume Next

If vCht = "" Or vCht = 0 Then

If TypeOf Sheets(vSheet) Is Chart Then

Set o = Sheets(vSheet).SeriesCollection(vSeries) _

.Trendlines(vTL)

Else

TLeval = "#Err: vCht can be omitted only if vSheet is a " _

& "chartsheet"

Exit Function '*****

End If

Else

Set o = Sheets(vSheet).ChartObjects(vCht).Chart. _

SeriesCollection(vSeries).Trendlines(vTL)

End If

On Error GoTo 0

If o Is Nothing Then

TLeval = "#Err: No trendline matches the specified parameters"

Exit Function

End If

vRet = ExtractCoef(o)

If TypeName(vRet) = "String" Then TLeval = vRet: Exit Function

Select Case o.Type

Case xlLinear

TLeval = vX * vRet(LBound(vRet)) + vRet(UBound(vRet))

Case xlExponential 'see comment above

TLeval = Exp(Log(vRet(LBound(vRet))) + vX * vRet(UBound(vRet)))

Case xlLogarithmic

TLeval = vRet(LBound(vRet)) * Log(vX) + vRet(UBound(vRet))

Case xlPower 'see comment above

TLeval = Exp(Log(vRet(LBound(vRet))) _

+ Log(vX) * vRet(UBound(vRet)))

Case xlPolynomial

Dim Idx As Long

TLeval = vRet(LBound(vRet)) * vX + vRet(LBound(vRet) + 1)

For Idx = LBound(vRet) + 2 To UBound(vRet)

TLeval = vX * TLeval + vRet(Idx)

Next Idx

End Select

End Function

Private Function DecodeOneTerm(ByVal TLText As String, _

ByVal SearchToken As String, _

ByVal UnspecifiedConstant As Byte)

'splits {optional number}{SearchToken} _

{optional numeric constant}

Dim v(1) As Double, TokenLoc As Long

TokenLoc = InStr(1, TLText, SearchToken, vbTextCompare)

If TokenLoc = 0 Then

v(1) = CDbl(TLText)

Else

If TokenLoc = 1 Then v(0) = 1 _

Else v(0) = Left(TLText, TokenLoc - 1)

If TokenLoc + Len(SearchToken) > Len(TLText) Then _

v(1) = UnspecifiedConstant _

Else v(1) = Mid(TLText, TokenLoc + Len(SearchToken))

End If

DecodeOneTerm = v

End Function

Private Function getXPower(ByVal TLText As String, _

ByVal XPos As Long)

If XPos = Len(TLText) Then

getXPower = 1

ElseIf IsNumeric(Mid(TLText, XPos + 1, 1)) Then

getXPower = Mid(TLText, XPos + 1, 1)

Else

getXPower = 1

End If

End Function

Private Function ExtractCoef(o As Trendline)

Dim XPos As Long, s As String

On Error Resume Next

s = o.DataLabel.Text

On Error GoTo 0

If s = "" Then

ExtractCoef = "#Err: No trendline equation found"

Exit Function '*****

End If

If o.DisplayRSquared Then s = Left$(s, InStr(s, "R") - 2)

s = Trim(Mid(s, InStr(1, s, "=", vbTextCompare) + 1))

Select Case o.Type

Case xlMovingAvg

Case xlLogarithmic

ExtractCoef = DecodeOneTerm(s, "Ln(x)", 0)

Case xlLinear

ExtractCoef = DecodeOneTerm(s, "x", 0)

Case xlExponential

s = Application.WorksheetFunction.Substitute(s, "x", "")

ExtractCoef = DecodeOneTerm(s, "e", 1)

Case xlPower

ExtractCoef = DecodeOneTerm(s, "x", 1)

Case xlPolynomial

Dim lOrd As Long

ReDim v(o.Order) As Double

s = Application.WorksheetFunction.Substitute(s, " ", "")

s = Application.WorksheetFunction.Substitute(s, "+x", "+1x")

s = Application.WorksheetFunction.Substitute(s, "-x", "-1x")

Do While s <> ""

XPos = InStr(1, s, "x")

If XPos = 0 Then

v(o.Order) = s 'constant term

s = ""

Else

lOrd = getXPower(s, XPos)

If XPos = 1 Then v(UBound(v) - lOrd) = 1 _

Else _

v(UBound(v) - lOrd) = Left(s, XPos - 1)

If XPos = Len(s) Then

s = ""

ElseIf IsNumeric(Mid(s, XPos + 1, 1)) Then

s = Trim(Mid(s, XPos + 2))

Else

s = Trim(Mid(s, XPos + 1))

End If

End If

Loop

ExtractCoef = v

End Select

End Function

Private Function ParamErr(v, ParamArray parms())

Dim l As Long

For l = LBound(parms) To UBound(parms)

If VarType(parms(l)) = vbError Then

v = parms(l)

ParamErr = True

Exit Function

End If

Next l

End Function

Trendline coefficients (2024)

FAQs

Trendline coefficients? ›

A trendline shows the trend in a data set and is typically associated with regression analysis. Creating a trendline and calculating its coefficients allows for the quantitative analysis of the underlying data and the ability to both interpolate and extrapolate the data for forecast purposes.

How do you interpret a TREND line? ›

Interpreting a Trend Line

A trend line with a positive slope indicates a positive relationship between the variables. They increase or decrease together. A trend line with a negative slope indicates a negative relationship between the two variables. When one goes up, the other goes down, and vice versa.

What does the R2 value mean in Excel trendline? ›

Trendline equation is a formula that finds a line that best fits the data points. R-squared value measures the trendline reliability - the nearer R2 is to 1, the better the trendline fits the data.

How do you interpret TREND lines in Excel? ›

When you fit a trendline to your data, Graph automatically calculates its R-squared value. If you want, you can display this value on your chart. A linear trendline is a best-fit straight line that is used with simple linear data sets. Your data is linear if the pattern in its data points resembles a line.

What does a polynomial trendline tell you? ›

A polynomial trendline is a curved line that is used when data fluctuates. It is useful, for example, for analyzing gains and losses over a large data set. The order of the polynomial can be determined by the number of fluctuations in the data or by how many bends (hills and valleys) appear in the curve.

How do you describe a trendline on a line graph? ›

A trendline is a line superimposed on a chart revealing the overall direction of the data. Google Charts can automatically generate trendlines for Scatter Charts, Bar Charts, Column Charts, and Line Charts. Google Charts supports three types of trendlines: linear, polynomial, and exponential.

How do you know if a trend line is strong? ›

By paying attention to the steepness of the Trend Line. For example: If your Trend Line is getting flatter, it means the market is moving into a range condition. And if your Trend Line is getting steeper, it means the trend is becoming stronger (or possibly going into a buying climax).

What is a good R and R2 value? ›

In finance, an R-squared above 0.7 would generally be seen as showing a high level of correlation, whereas a measure below 0.4 would show a low correlation. This is not a hard rule, however, and will depend on the specific analysis.

What does the correlation coefficient R tell us about our trendline? ›

The correlation coefficient r measures the direction and strength of a linear relationship.

What does coefficient of trendline mean? ›

A trendline shows the trend in a data set and is typically associated with regression analysis. Creating a trendline and calculating its coefficients allows for the quantitative analysis of the underlying data and the ability to both interpolate and extrapolate the data for forecast purposes.

How do you read a trendline equation? ›

To find your equation of a trend line, follow these steps.
  1. Step 1: Draw your trend line. You begin by drawing your trend line. ...
  2. Step 2: Locate two points on the line. Your next step is to locate two points on the trend line. ...
  3. Step 3: Plug these two points into the formula for slope. The formula for slope is this one:
Jan 5, 2022

How do you evaluate a trend in Excel? ›

Draw a trendline to visualize the trend

To add a trend to an existing chart, right-click the data series, and then click Add Trendline… This will create the default linear trendline for the current data and open the Format Trendline pane where you can choose another trendline type.

What does a trendline show? ›

Trendlines, also known as lines of best fit or regression lines, graphically illustrate trends in data series and are commonly used when charting predictions. A trendline is typically a line or curve that connects or passes through two or more points in the series, showing a trend.

Does trendline indicate correlation? ›

Trend lines are lines used to approximate the general shape of a scatter plot. A positive trend line tells us the scatter plot has a positive correlation. A negative trend line tells us the scatter plot has a negative correlation.

Does a trendline show correlation? ›

In a scatter plot , a trend line is great for showing a correlation in your data. If your data points are scattered all over the chart, a trend line is useless – but if you see a trend in the data that you want to show, Datawrapper can draw a trend line for you.

What are two factors that determine the significance of a trendline? ›

Three factors determine the importance of a trendline: the lenght of time it has been intact, the number of time it has been tested and. the steepness or slope of the trendline.

How do you describe a trend in data? ›

to describe data in trend graphs:
  1. There was a.
  2. slight.
  3. small.
  4. gradual.
  5. steady.
  6. significant.
  7. dramatic.
  8. sharp.

How do you write a trend analysis? ›

  1. 1 – Choose Which Pattern You Want to Identify. The first and most obvious step in trend analysis is to identify which data trend you want to target. ...
  2. 2 – Choose Time Period. ...
  3. 3 – Choose Types of Data Needed. ...
  4. 4 – Gather Data. ...
  5. 5 – Use Charting Tools to Visualize Data.
  6. 6 – Identify Trends.
Feb 8, 2023

How do you know if a trend is strong or weak? ›

How do you identify trends? The best way to identify trends, in my experience, is to use simple price action. Higher highs and higher lows signal an uptrend, while lower highs and lower lows represent a downtrend.

How do I know if a trend is losing strength? ›

The direction of the ADX line is important for reading trend strength. When the ADX line is rising, trend strength is increasing, and the price moves in the direction of the trend. When the line is falling, trend strength is decreasing, and the price enters a period of retracement or consolidation.

How do you identify a weak trend? ›

A healthy trend is when the market has a healthy pullback and remains above the 50MA. In such market conditions, it's possible to trade the pullback. Possibly towards the 50MA or, previous Resistance turned Support (in an uptrend). A weak trend is when the market has steep pullbacks but remains above the 200MA.

What is a good R coefficient? ›

The relationship between two variables is generally considered strong when their r value is larger than 0.7. The correlation r measures the strength of the linear relationship between two quantitative variables.

Is an R value of 0.5 good? ›

Correlation coefficients whose magnitude are between 0.5 and 0.7 indicate variables which can be considered moderately correlated. Correlation coefficients whose magnitude are between 0.3 and 0.5 indicate variables which have a low correlation.

How do you interpret the linear regression coefficient? ›

Interpreting Linear Regression Coefficients

A positive coefficient indicates that as the value of the independent variable increases, the mean of the dependent variable also tends to increase. A negative coefficient suggests that as the independent variable increases, the dependent variable tends to decrease.

What is the difference between r2 and correlation coefficient? ›

So, what's the difference between correlation and R-squared? Correlation measures the strength of the relationship between two variables, while R-squared measures the amount of variation in the data that is explained by the model.

What does an r2 value of 0.99 mean? ›

Practically R-square value 0.90-0.93 or 0.99 both are considered very high and fall under the accepted range. However, in multiple regression, number of sample and predictor might unnecessarily increase the R-square value, thus an adjusted R-square is much valuable.

What is considered a weak correlation? ›

If we wish to label the strength of the association, for absolute values of r, 0-0.19 is regarded as very weak, 0.2-0.39 as weak, 0.40-0.59 as moderate, 0.6-0.79 as strong and 0.8-1 as very strong correlation, but these are rather arbitrary limits, and the context of the results should be considered.

Is a higher coefficient better? ›

In most cases, the lower the coefficient of variation the better because it means the spread of data values is low relative to the mean. The following examples illustrate this phenomenon in different fields.

What is the best degree of trendline? ›

Pay attention to the angle of the trendline. If it's less than 30 degrees, a trend is too steep and unstable. It's better when the trend's angle exceeds 45 degrees.

What is the R-squared of a trendline? ›

R-squared is the percentage of the dependent variable variation that a linear model explains. 0% represents a model that does not explain any of the variation in the response variable around its mean. The mean of the dependent variable predicts the dependent variable as well as the regression model.

What does a trendline tell you on a scatter plot? ›

A trend line is a straight line that best represents the points on a scatterplot. The trend line may go through some points but need not go through them all. The trend line is used to show the pattern of the data. This trend line may show a positive trend or a negative trend.

How do you measure trend value? ›

Measurement of Trend by the Method of Moving Average

It measures the trend by eliminating the changes or the variations by means of a moving average. The simplest of the mean used for the measurement of a trend is the arithmetic means (averages).

How do you use a trendline to find the predicted value in Excel? ›

Forecast the future with Excel trendlines
  1. Create. a bar chart of the data you've tracked so far.
  2. Click. on your chart, and then click on the data series.
  3. Go. to Chart | Add Trendline.
  4. Click. on the Options tab.
  5. In. the Forecast section, click on the up arrow in the Forecast box until the. ...
  6. Click. OK.

How do you describe the trend of a graph? ›

Trends
  1. A trend is a pattern in a set of results displayed in a graph.
  2. In the graph above, although there is not a straight line increase in figures, overall the trend here is that sales are increasing.

How do you interpret a trendline in a scatter plot? ›

A trend line is a straight line that best represents the points on a scatterplot. The trend line may go through some points but need not go through them all. The trend line is used to show the pattern of the data. This trend line may show a positive trend or a negative trend.

How do you use a trend line to predict? ›

To draw a trend line, use a straightedge to draw a line that has about the same number of points above and below it. Ignore any outliers. Use the above trend line to predict how long it would take for Lily to run 4.5 miles. Time taken for Lily to run 4.5 miles is about 45 minutes.

What is the best description of trend analysis? ›

Trend analysis is a technique used to examine and predict movements of an item based on current and historical data. You can use trend analysis to improve your business using trend data to inform your decision-making.

How do you interpret a line graph example? ›

Interpreting Line Charts

The changing slope of the line segments emphasizes changes, trends, and patterns. For a single series of data, assess the changes in the line to identify trends and patterns. When you have multiple metrics, compare their lines to determine whether they have the same trend and patterns.

What does R-squared tell you about a trendline? ›

R-squared is a statistical measure that indicates how much of the variation of a dependent variable is explained by an independent variable in a regression model.

What is the trends method best used to predict? ›

The trends method involves determining the speed and direction of movement for fronts, high and low pressure centers, and areas of clouds and precipitation. Using this information, the forecaster can predict where he or she expects those features to be at some future time.

Top Articles
Latest Posts
Article information

Author: Golda Nolan II

Last Updated:

Views: 6617

Rating: 4.8 / 5 (78 voted)

Reviews: 85% of readers found this page helpful

Author information

Name: Golda Nolan II

Birthday: 1998-05-14

Address: Suite 369 9754 Roberts Pines, West Benitaburgh, NM 69180-7958

Phone: +522993866487

Job: Sales Executive

Hobby: Worldbuilding, Shopping, Quilting, Cooking, Homebrewing, Leather crafting, Pet

Introduction: My name is Golda Nolan II, I am a thoughtful, clever, cute, jolly, brave, powerful, splendid person who loves writing and wants to share my knowledge and understanding with you.