# LINEST function

Calculates the statistics for a line by using the "least squares" method to calculate a straight line that best fits your data, and returns an array (array: A set of data used to build single formulas that produce multiple results or that operate on a group of arguments arranged in rows and columns. Excel has two kinds of arrays: array ranges and array constants. An array range is a rectangular area of cells that share a common formula; an array constant is a specially arranged group of constants that's used as an argument in a formula.) that describes the line. Because this function returns an array of values, it must be entered as an array formula (array formula: A single formula that performs multiple calculations and then returns either a single result or multiple results. Array formulas act on two or more sets of values, known as array arguments. Each array argument must have the same number of rows and columns. To return multiple results, the formula must be entered into multiple cells. When you enter an array formula by pressing ⌘+RETURN. Excel automatically encloses the array formula between braces ( { } ).).

The equation for the line is y = mx + b or y = m1x1 + m2x2 + ... + b (if there are multiple ranges of x values) where the dependent y value is a function of the independent x values. The m values are bases corresponding to each exponent x value, and b is a constant value. Note that y, x, and m can be vectors. The array that LINEST returns is {m n,m n-1,...,m 1,b}. LINEST can also return additional regression statistics.

### Syntax

LINEST(known_y's, known_x's, const, stats )

Argument Description Remarks
known_y's The set of y values that you already know in the relationship y = mx + b.
known_x's An optional set of x values that you may already know in the relationship y = mx + b.
const A logical value that specifies whether to force the constant b to equal 0.
• If const is TRUE or omitted, b is calculated normally.
• If const is FALSE, b is set equal to 0 (zero), and the m values are adjusted so that y = mx.
stats A logical value that specifies whether to return additional regression statistics.
• If stats is TRUE, this function returns the additional regression statistics, so that the returned array is {m n,m n-1,...,m 1,b;se n,se n-1,...,se 1,se b;r 2,se y;F,d f;ss reg,ss resid}.
• If stats is FALSE or omitted, this function returns only the m coefficients and the constant b.

Statistic Description
se 1,se 2,...,se n The standard error values for the coefficients m 1,m 2,...,m n.
se b The standard error value for the constant b (se b = #N/A when const is FALSE).
r 2 The coefficient of determination. Compares estimated and actual y values, and ranges in value from 0 to 1. If it is 1, there is a perfect correlation in the sample — there is no difference between the estimated y value and the actual y value. At the other extreme, if the coefficient of determination is 0, the regression equation is not helpful in predicting a y value. For information about how r 2 is calculated, see the General Remarks section of this topic.
se y The standard error for the y estimate.
F The F statistic, or the F observed value. Use the F statistic to determine whether the observed relationship between the dependent and independent variables occurs by chance.
d f The degrees of freedom. Use the degrees of freedom to help you find F critical values in a statistical table. Compare the values you find in the table to the F statistic returned by LINEST to determine a confidence level for the model. For information about how d f is calculated, see the General Remarks section of this topic. Example 4 below shows use of F and d f.
ss reg The regression sum of squares.
ss resid The residual sum of squares. For information about how ss reg and ss resid are calculated, see the General Remarks section of this topic.

The following illustration shows the order in which the additional regression statistics are returned.

• You can describe any straight line with the slope and the y intercept:

To find the slope of a line, often written as m, take two points on the line, (x1,y1) and (x2,y2); the slope is equal to (y2 - y1)/(x2 - x1).

The y intercept of a line, often written as b, is the value of y at the point where the line crosses the y axis.

The equation of a straight line is y = mx + b. Once you know the values of m and b, you can calculate any point on the line by plugging the y or x value into that equation. You can also use the TREND function.

• When you have only one independent x variable, you can obtain the slope and y intercept values directly by using the following formulas:

For slope, use =INDEX(LINEST(known_y's, known_x's), 1 )

For y intercept, use =INDEX(LINEST(known_y's, known_x's), 2 )

• The accuracy of the line calculated by LINEST depends on the degree of scatter in your data. The more linear the data, the more accurate the LINEST model. LINEST uses the method of least squares for determining the best fit for the data. When you have only one independent x variable, the calculations for m and b are based on the following formulas:

where x and y are sample means, i.e., x = AVERAGE(known x's) and y = AVERAGE(known_y's).

• The line- and curve-fitting functions LINEST and LOGEST can calculate the best straight line or exponential curve that fits your data. However, you have to decide which of the two results best fits your data. You can calculate TREND(known_y's, known_x's ) for a straight line, or GROWTH(known_y's, known_x's ) for an exponential curve. These functions, without the new_x's argument, return an array of y values predicted along that line or curve at your actual data points. You can then compare the predicted values with the actual values. You may want to chart them both for a visual comparison.
• In regression analysis, Excel calculates for each point the squared difference between the y value estimated for that point and its actual y value. The sum of these squared differences is called the residual sum of squares, ss resid. Excel then calculates the total sum of squares, ss total. When const = TRUE, or omitted, the total sum of squares is the sum of the squared differences between the actual y values and the average of the y values. When const = FALSE, the total sum of squares is the sum of the squares of the actual y values (without subtracting the average y value from each individual y value). Then regression sum of squares, ss reg, can be found from: ss reg = ss total - ss resid. The smaller the residual sum of squares is, compared with the total sum of squares, the larger the value of the coefficient of determination, r 2, which is an indicator of how well the equation resulting from the regression analysis explains the relationship among the variables. r 2 = ss reg/ss total.
• In some cases, one or more of the x columns (assume that y's and x's are in columns) may have no additional predictive value in the presence of the other x columns. In other words, eliminating one or more x columns might lead to predicted y values that are equally accurate. In that case these redundant x columns should be omitted from the regression model. This phenomenon is called "collinearity" because any redundant x column can be expressed as a sum of multiples of the non-redundant x columns. LINEST checks for collinearity and removes any redundant x columns from the regression model when it identifies them. Removed x columns can be recognized in LINEST output as having 0 coefficients as well as 0 se's. If one or more columns are removed as redundant, then d f is affected because d f depends on the number of x columns actually used for predictive purposes. For details on the computation of d f, see Example 4 below. If d f is changed because redundant x columns are removed, values of se y and F are also affected. Collinearity should be relatively rare in practice. However, one case where it is more likely to arise is when some x columns contain only 0's and 1's as indicators of whether a subject in an experiment is or is not a member of a particular group. If const = TRUE or omitted, LINEST effectively inserts an additional x column of all 1's to model the intercept. If you have a column with a 1 for each subject if male, or 0 if not, and you also have a column with a 1 for each subject if female, or 0 if not, this latter column is redundant because entries in it can be obtained from subtracting the entry in the "male indicator" column from the entry in the additional column of all 1's added by LINEST.
• d f is calculated as follows when no x columns are removed from the model due to collinearity: if there are k columns of known_x's and const = TRUE or omitted, then d f = n - k - 1. If const = FALSE, then d f = n - k. In both cases, each x column removed due to collinearity increases d f by 1.
• Be aware that the y values predicted by the regression equation may not be valid if they are outside the range of the y values that you used to determine the equation.

To make the following example easier to understand, you can copy the data to a blank sheet and then enter the function underneath the data. Do not select the row or column headings (1, 2, 3... A, B, C...) when you copy the sample data to a blank sheet.

Note   The formula in the example must be entered as an array formula (array formula: A single formula that performs multiple calculations and then returns either a single result or multiple results. Array formulas act on two or more sets of values, known as array arguments. Each array argument must have the same number of rows and columns. To return multiple results, the formula must be entered into multiple cells. When you enter an array formula by pressing ⌘+RETURN. Excel automatically encloses the array formula between braces ( { } ).). First, type the formula into cell A7 and then press RETURN . The single result is 2. Next, select the range A7:B7, press CONTROL + U , and then press + RETURN . When entered as an array, the slope (2) and the y-intercept (1) are returned.

Known Y's Known X's
1 0
9 4
5 2
7 3
Formula Formula
=LINEST(A2:A5, B2:B5,, FALSE )

Suppose a small business has sales of \$3,100, \$4,500, \$4,400, \$5,400, \$7,500, and \$8,100 during the first six months of the fiscal year. You can use the following simple linear regression model to estimate sales for the ninth month.

To make the following example easier to understand, you can copy the data to a blank sheet and then enter the function underneath the data. Do not select the row or column headings (1, 2, 3... A, B, C...) when you copy the sample data to a blank sheet.

Note   In general, the formula =SUM({m,b}*{x,1}) equals mx + b, the estimated y value for a given x value. You can also use the TREND function.

Month Sales
1 3100
2 4500
3 4400
4 5400
5 7500
6 8100
Formula Description (Result)
=SUM(LINEST(B2:B7, A2:A7)*{9,1} ) Estimate sales for the ninth month (11000)

Suppose a commercial developer is considering the purchase of a group of small office buildings in an established business district. The developer can use multiple linear regression analysis to estimate the value of an office building in a given area based on the following variables.

Variable Refers to the
y Assessed value of the office building
x1 Floor space in square feet
x2 Number of offices
x3 Number of entrances
x4 Age of the office building in years

This example assumes that a straight-line relationship exists between each independent variable (x1, x2, x3, and x4) and the dependent variable (y), the value of office buildings in the area.

The developer randomly chooses a sample of 11 office buildings from a possible 1,500 office buildings and obtains the following data. "Half an entrance" means an entrance for deliveries only.

To make the following example easier to understand, you can copy the data to a blank sheet and then enter the function underneath the data. Do not select the row or column headings (1, 2, 3... A, B, C...) when you copy the sample data to a blank sheet.

Floor space (x1) Office (x2) Entrances (x3) Age (x4) Assessed value (y)
2310 2 2 20 142000
2333 2 2 12 144000
2356 3 1.5 33 151000
2379 3 2 43 150000
2402 2 3 53 139000
2425 4 2 23 169000
2448 2 1.5 99 126000
2471 2 2 34 142900
2494 3 3 23 163000
2517 4 4 55 169000
2540 2 3 22 149000
Formula
=LINEST(E2:E12, A2:D12, TRUE, TRUE )

Note   The formula in the example must be entered as an array formula (array formula: A single formula that performs multiple calculations and then returns either a single result or multiple results. Array formulas act on two or more sets of values, known as array arguments. Each array argument must have the same number of rows and columns. To return multiple results, the formula must be entered into multiple cells. When you enter an array formula by pressing ⌘+RETURN. Excel automatically encloses the array formula between braces ( { } ).). First, type the formula into cell A14 and then press RETURN . The single result is -234.2371645. Next, select the range A14:E18, press CONTROL + U , and then press + RETURN .

When entered as an array formula, the following regression statistics are returned. See the Additional Regression Statistics section of this topic to identify the statistic you want.

A B C D E
-234.2371645 2553.21066 12529.76817 27.64138737 52317.83051
13.26801148 530.6691519 400.0668382 5.429374042 12237.3616
0.996747993 970.5784629 #N/A #N/A #N/A
459.7536742 6 #N/A #N/A #N/A
1732393319 5652135.316 #N/A #N/A #N/A

The multiple regression equation, y = m1*x1 + m2*x2 + m3*x3 + m4*x4 + b, can now be obtained using the values from row 14:

y = 27.64*x1 + 12,530*x2 + 2,553*x3+ 234.24*x4 + 52,318

The developer can now estimate the assessed value of an office building in the same area that has 2,500 square feet, three offices, and two entrances and is 25 years old, by using the following equation:

y = 27.64*2500 + 12530*3 + 2553*2 - 234.24*25 + 52318 = \$158,261

Or you can copy the following table to cell A21 of the example workbook.

Floor space (x1) Office (x2) Entrances (x3) Age (x4) Assessed value (y)
2500 3 2 25 =D14*A22 + C14*B22 + B14*C22 + A14*D22 + E14

You can also use the TREND function to calculate this value.

In Example 3, the coefficient of determination, or r 2, is 0.99675 (see cell A17 in the output for LINEST), which would indicate a strong relationship between the independent variables and the sale price. You can use the F statistic to determine whether these results, with such a high r 2 value, occurred by chance.

Assume for the moment that in fact there is no relationship among the variables, but that you have drawn a rare sample of 11 office buildings that causes the statistical analysis to demonstrate a strong relationship. The term "Alpha" is used for the probability of erroneously concluding that there is a relationship.

F and d f in LINEST output can be used to assess the likelihood of a higher F value occurring by chance. F can be compared with critical values in published F distribution tables or the FDIST function can be used to calculate the probability of a larger F value occurring by chance. The appropriate F distribution has v1 and v2 degrees of freedom. If n is the number of data points and const = TRUE or omitted, then v1 = n - df - 1 and v2 = df. (If const = FALSE, then v1 = n - df and v2 = df.) FDIST(F,v1,v2) will return the probability of a higher F value occurring by chance. In this example, d f = 6 (cell B18) and F = 459.753674 (cell A18).

Assuming an Alpha value of 0.05, v1 = 11 – 6 – 1 = 4 and v2 = 6, the critical level of F is 4.53. Since F = 459.753674 is much higher than 4.53, it is extremely unlikely that an F value this high occurred by chance. (With Alpha = 0.05, the hypothesis that there is no relationship between known_y's and known_x's is to be rejected when F exceeds the critical level, 4.53.) Using the FDIST function, you can obtain the probability that an F value this high occurred by chance. FDIST(459.753674, 4, 6) = 1.37E-7, an extremely small probability. You can conclude, either by finding the critical level of F in a table or by using the FDIST function, that the regression equation is useful in predicting the assessed value of office buildings in this area. Remember that it is critical to use correct values of v1 and v2 computed in the previous paragraph.

Another hypothesis test will determine whether each slope coefficient is useful in estimating the assessed value of an office building in Example 3. For example, to test the age coefficient for statistical significance, divide -234.24 (age slope coefficient) by 13.268 (the estimated standard error of age coefficients in cell A15). The following is the t observed value:

t = m 4 ÷ se 4 = -234.24 ÷ 13.268 = -17.7

If the absolute value of t is sufficiently high, it can be concluded that the slope coefficient is useful in estimating the assessed value of an office building in Example 3. The table below shows the absolute values of the 4 t observed values.

If you consult a table in a statistics manual, you will find that t critical, two tailed, with 6 degrees of freedom and Alpha = 0.05 is 2.447. This critical value can also be found using the TINV function. TINV(0.05,6) = 2.447. Because the absolute value of t, 17.7, is greater than 2.447, age is an important variable when estimating the assessed value of an office building. Each of the other independent variables can be tested for statistical significance in a similar manner. The following are the t observed values for each of the independent variables.

Variable t-observed value
Floor space 5.1
Number of offices 31.3
Number of entrances 4.8
Age 17.7

These values all have an absolute value greater than 2.447; therefore, all the variables used in the regression equation are useful in predicting the assessed value of office buildings in this area.

LOGEST function

INDEX (reference) function

TREND function

F.DIST.RT function

T.INV.2T function

T.INV function

List of all functions (by category)

Applies to:
Excel for Mac 2011