
How Do I Calculate Outliers In Excel?


Search Excel Forum Posts, Tutorials, Macros, Tips, and More
Is there a way for excel to automatically determine outliers in a given data
set and exclude them from future calculations?
Similar Excel Video Tutorials
Date Math and IF function
 See how to calculate the Maturity Date for a loan, how to calculate a running balance for an account using the IF function, and see how to calculate t ...
2 Ways To Calculate Geometric Mean
 Download Excel workbook http://people.highline.edu/mgirvin/ExcelIsFun.htm Using Historical Stock Data see two different ways to calculate the geo ...
Similar Topics
hi,
i have a scatter plot for which i want to insert a best fit line. however, there are some outliers within the data. i have manually identified these outliers and coloured them differently from the nonoutliers, but what i would like to do now is to insert a bestfit straight line for the remaining good data, while still keeping the outliers on the scatter plot. is there a way to do that?
i found this post:
http://thedailyreviewer.com/office/v...iers110140914
but do not understand how to "fit the trend line based on the new set of points, but graph the old set".
thanks for any help.
I've been racking my brain over this all weekend. Can you help me determine the best minimum time excluding outliers?
For example:
Dave 15
Jim 10
Mike 12
John 5
Jill 12
Kim 13
Joey 21
Ralph 6
Sue 20
I would consider 5, 6, 20, and 21 outliers. I need to remove them to show the minimum (or best time) which in this case would be 10.
Thanks,
Chad
does anyone know if I can get outliers from a histogram, or do I need to do a
scatter plot? I am using excel (most recent version) for mac.
I have lots of data, mostly triplicate results, with some outliers. I would
appreciate help on applying Thompson tau or other outlier removal technique
in Excel.
Thanks!
Hi!! I've been trying with no luck to come up with a way to:
a) find all the outliers in a range of a 1000 rows. The criteria is simply +/ average*2. The outlier can either be above or below.
b) colour the background of these outliers.
c) repeat a&b in the next column.
So in essence, I'm trying to find all values in a table that are twice (positive or negative) that of the average and highlight them.
Example:
A, B
4, 1
4, 1
4, 1
4, 4
4, 1
4, 1
4, 1
12, 1
In this case the outlier for A is 12 and the outlier for B is 4.
Any help, be it in a formula for a given cell, or VBA, is deeply appreciated.
Sincerely
Kurifodo
Using the averaging approach and descriptive statistics can anyone tell me how do i find outliers? i know how to do it using the 5number summary but cant figure out how to do it this way
thanks.
hi all,
i have a range of numbers in C5:AL5
Not all of the cells are filled in, and some that are filled in are blank.
In AM5 I calculate the average of the data
and in AN5 the standard deviation where the value in the cells in the range are >0
Is it possible to calculate a standard deviation in another cell which excludes those values in the range which are 3 standard deviations away from the value in cell AN5.
Therefore, the standard deviation with the outliers.
If there is another way of doing this, i would be grateful for your help.
Luce
Hello,
When evaluating a set of items for their popularity, I pull popularity metrics from two different sources, aggregating each score into an average. While I am satisfied with this metric (Let's call it the 'popularity score'), I'd like to create a simplified, more meaningful score.
My initial though was to create an index using the following formula:
Code:
(D3/MAX($D:$D))*100)
Where D3 is the individual popularity score and D represents all popularity scores.
I discovered, however, that outliers would have a disruptive effect on the data. For instance, if there where several VERY popular items, other items with acceptable popularity scores would appear very low in relative terms.
Any ideas on how to account for the Outliers while still retaining the meaning in an easy to understand way?
Here is part of the data:
Here is my data (D3:D29):
0.00
0.00
0.00
42.56
0.00
680.99
0.00
0.00
60.80
0.00
18.24
0.00
85.12
170.24
0.00
0.00
297.92
103.36
0.00
0.00
0.00
30.40
18.24
30.40
72.96
0.00
A colleague suggest that I take the lognormal distribution and then take the antilog. Any idea how to implement, or if this is a sound approach?
Hello All,
I'm new to the forum and have not needed to do anything like this before but assume that excel can do this kind of thing.
I have tried to search for a solution everywhere.
OK, I have a large data set of triplicates in column F (example below):
B
C
D
E
F
1
121
INT#6
SYBR
Unknown
25.944914
2
122
INT#6
SYBR
Unknown
25.577063
3
123
INT#6
SYBR
Unknown
24.799803
4
124
INT#6
SYBR
Unknown
27.107956
5
125
INT#6
SYBR
Unknown
26.962341
6
126
INT#6
SYBR
Unknown
27.189684
7
127
INT#6
SYBR
Unknown
26.185757
8
128
INT#6
SYBR
Unknown
26.0001
9
129
INT#6
SYBR
Unknown
26.207401
10
130
INT#6
SYBR
Unknown
25.44769
11
131
INT#6
SYBR
Unknown
25.850075
12
132
INT#6
SYBR
Unknown
26.147488
13
133
INT#6
SYBR
Unknown
24.232754
14
134
INT#6
SYBR
Unknown
24.747488
15
135
INT#6
SYBR
Unknown
24.813463
16
136
INT#6
SYBR
Unknown
25.699005
17
137
INT#6
SYBR
Unknown
25.42176
18
138
INT#6
SYBR
Unknown
25.657625
19
139
INT#6
SYBR
Unknown
25.596703
20
140
INT#6
SYBR
Unknown
24.756275
21
141
INT#6
SYBR
Unknown
24.952007
22
142
INT#6
SYBR
Unknown
25.594326
23
143
INT#6
SYBR
Unknown
26.192163
24
144
INT#6
SYBR
Unknown
24.788391
so one triplicate is 121 to 123 (data in F1:F3) then the next set is 124 to 126 (data in F4:F6).
What I want to do is compare each number within the triplicate to the others within the same triplicate and determine which are +/ 0.5 from the others and label them as outliers.
I have come up with a poor solution that works for most instances but not when two values are 0.5 from each other, neither is marked as an outlier. In this case I would like to the formula to choose the value furthest from the median as the outlier.
I would also ideally like to mark the outliers by somehow highlighting in red.
my poor solution occupies the next 3 columns G,H and I and is very clunky:
G
H
I
1
=IF(OR($F2($F3+0.5)),1,0)
=IF(OR($F2($F4+0.5)),1,0)
=SUM(I2:J2)
2
=IF(OR($F3($F2+0.5)),1,0)
=IF(OR($F3($F4+0.5)),1,0)
=SUM(I3:J3)
3
=IF(OR($F4($F2+0.5)),1,0)
=IF(OR($F4($F3+0.5)),1,0)
=SUM(I4:J4)
4
=IF(OR($F5($F6+0.5)),1,0)
=IF(OR($F5($F7+0.5)),1,0)
=SUM(I5:J5)
5
=IF(OR($F6($F5+0.5)),1,0)
=IF(OR($F6($F7+0.5)),1,0)
=SUM(I6:J6)
6
=IF(OR($F7($F5+0.5)),1,0)
=IF(OR($F7($F6+0.5)),1,0)
=SUM(I7:J7)
The if the result in column I is 2 then the corresponding value in the data set is an outlier.
Can anyone help?
I hope my explanation of the problem is understandable.
This is really bugging me. If I can get the formula to work then It will save me days of trawling through data picking outliers by eye.
I eagerly await a solution and will try things of my own in the mean time but as you can see I am far from an excel master.
Cheers
Hi everyone...
here is the situation...i need a formula for median to exclude the min and the max value ..that are above 0!!!
i tried this formula ...(ARRAY!) but i get Num error! ..any ideas?
suppose H28:H32 is the data range!
{=+MEDIAN(IF(H28:H32>0,IF(H28:H32>MIN(H28:H32),IF(H28:H32<MAX(H28:H32),H28:H32))))}
Is there a good way of excluding an outlier in an average calculation. In the example below will I exclude 1000 from the averagecalculation.
The way to decide excludingvalues can either be a percent based on the range or everything that is a higher than a user defined value. It can also be more than one outlier.
A user defined function is OK with me, if it is impossible to use the builtin functions. Any suggestion?
Code:
1
2
3
4
5
6
7
1000
8
9
10
11
12
Hi.
I am analyzing sales figures and need some assistance. I have a variety of products and a variety of territories where these products are sold. Additionally I have Gross Profit Margin figures for each product sold and many of the same products have differing GP % due to flexibility of sales agents setting the sale price.
Ideally I would love to find out what products are the most profitable, along with which areas these products sell for the most effectively.
However, after developing a Pivot Table with products in rows and Territories in Columns, I realized some products have "outlier like" GP%. These products have zero revenue as they were given to the customer as a freebie in a large contract which throws off my analysis on how profitable the product category is.
Would you recommend that I remove outliers and solely calculate based on 'normally' priced products? And second, I am also suppose to analyze large order (contain many products), how do you suggest I do that (especially with consideration to question 1)??
Thanks!!
Can anyone help with a box and whisker related graph?
I can do the 50% quartile method of sorting this out but for a piece of research I need to study the middle 75% and exclude the top and bottom 12.5% of outliers only.
I tried the percentile approach to calculate the top and bottom 12.5% but this of course calculates figures that don't match up to actual data points, unlike the quartile method and median which actually do match up to the data points. As i need my box graph to match up to all the data points in my presentation the percentile approach doesn't work for me.
Does anyone have any idea how this can be done?
Thanks
Hi all,
I'm trying to learn how conditional formatting works, and I kind of have the gist of it but I can't get it to work after calculations. Here's what I'm trying to do:
I want an icon to appear after a sum calculation formula so that I can spot outliers easily. It needs to calculate the sum of 3 numbers (found in L2,M2,N2) and if that sum is between 5 and 22 then let a particular icon appear, but if it's outside that range, then let another icon appear. Icons should appear in cell O2. I'm going to copy the formula down the column so it doesn't need to use absolute references.
I'm not sure how to set it up because I'm confused on my syntax of commands. I know I probably need to use "countif" or "if", but where do I place the "sum" function?
Any help is greatly appreciated.
I'm attempting to do my first forecast, but I seem to be failing miserably. When I use the FORECAST function all of the forecasted totals are much too high. I
Is there a way to exclude outliers or use a moving average..... or am I just using the function completely wrong?
Any pointers would be greatly appreciated, when it's all said and done I would love to take the previous 6 months of processed data and use that to forecast the processed amounts for the next 6 months... but as it stands now these numbers won't be nearly accurate enough.
Hi,
Sorry if this comes off as very basic, but it's such a specific answer i'm looking for, and think this would be the fastest way to solve the problem. Here's my scenario:
I am analyzing data in a lab, and need to filter this data for outliers before taking it any further. The way we've decided to filter the data is to calculate (2 x standard deviation) for each column, and add/subtract this calculation from the average for each column to establish "ceiling" values for upper and lower extremes. Any data point exceeding the ceilings is to be attenuated to the ceiling itself.
For example, suppose a column has an average of 100, and a standard deviation of 50. In this case, the ceiling values would be 0 and 200. If a data point is found to be, say, 213, it would be adjusted to 200. If a data point were e.g. 23, it would be changed to 0. Make sense?
At this point, I am just doing calculating the ceiling values, and checking for outliers by eye. But this is both time consuming, and has an inherently higher risk for error. I want to make a formula that will automate these adjustments, rather than do this by eye.
Here is the approach I'm thinking of, but I may be way off:
For each data point: =IF(A2>(2*StDev(A2:A8)), 2*StDev(A2:A8), ?)
(A2:A8 is, of course, just an example)
(? is where I am stuck)
This is where I'm stuck  I have no idea how to revert the cell to its original value in the case that it ISN'T an outlier and doesn't need to be changed.
Also, I think I can figure this part out eventually, but I'll throw it out there anyway: This If statement actually needs to be 2 in 1, in that it filters both exceedingly low AND high data. I think this will require the OR statement, but I need to study it more at the moment.
Any help is appreciated!
Hello all,
I'm trying to get a standard deviation of data range, excluding the outliers. I'm excluding
I have a set of data containing outliers based on a median and mean. The data is all in one column and the median and mean are each calculated in one cell via the average and mean formulas. in another column, if a value is an outlier i have it marked as "high" "low" or "" depending on whether its high, low, or not an outlier. What i would like to do is have the mean and median functions which currently look like =AVERAGE(A4:A32) exclude data that has "high" or "low" in the same row B column.
Thanks!
I can't see the forest through the trees on this one. My spreadsheet is designed to calculate waiting periods between each step of a multistep process, i.e. Person A makes an appointment with GP, date of appointment, waiting time? Referral to specialist, date of appointment, waiting time? Diagnosis, date of procedure, waiting time. I am sure you get the drift.
My waiting time calculations utilise the NETWORKDAYS function  1 and eliminate outliers created by empty cells with =IF(AND(F16
You guys are geniuses so hopefully someone has a good idea of how I can save a lot of time.
I have one worksheet with sales history per product by day. (dates are collumns, products are rows, qtys in the middle, all are values, not formulas). (I'll be using the Excel document as a workbook for future data, too, so whatever I create needs to work no matter what I paste into this raw data sheet.)
Then, I created a worksheet that pulls over the data from the first sheet, but also adds a row of trendline... Averages the 14 previous days' qtys. (so the trendline starts on day 15.)
I would like these numbers to paste back onto the first sheet, overwriting the existing data.
Constraints:
 Every time i do this, there may be up to 500 products, so cutting & pasting manually isn't an option.
 Some of the data points are 0 in the raw data sheet. (and some outliers are way high) Ideally, these would not be calculated in the average/ trendline.
Ideas:
 I could calculate the std dev of the data for each product & for hte average formula put in an IF stmt like if the value is more than 2 stdev of mean not to calculate it? Not sure how to write that formula.
 I could create a macro that will take the average data & paste it over the raw data. (My concern here is that the products' data don't all start on the same day. Also don't knwo how the macro would know to put it in the right day.)
Help is greatly appreciated! Have a good night!
Hi there,
I have a worksheet containing data in columns. For each column, I would like to calculate the standard deviation of the mid95% of the values i.e. 2.5th percentile to 97.5th percentile (to reduce the effect of "outliers").
So first of all I calculate the values of the 2.5th and 97.5th percentiles:
A102 (2.5th percentile) = PERCENTILE(A1:A100,0.025)
A103 (97.5th percentile) = PERCENTILE(A1:A100,0.975)
Then using an array formula shamelessly plagiarized from this very website, I calculate the standard deviation of the values that lie in that range:
{=STDEV(IF((A1:A100>=A102)*(A1:A100
Does excel have a calculation for Winsorising or Winsorization
It's a statistical calculation to round up or down outliers vs trimming or removing them from the data pool.
Thanks
Hi everyone!
I've got the following code which is meant to find cells with no values, colour the cells in, and then replace the missing value with the average of that column. the average is located in a different sheet
Code:
Sub Outliers()
Dim i, j, row, col As Integer
' Prompting messages
col = Application.InputBox _
(Prompt:="Please enter the number of rows ", _
Title:="Row Number", Type:=1)
row = Application.InputBox _
(Prompt:="Please enter the number of colums ", _
Title:="Colum Number", Type:=1)
For i = 1 To row
For j = 1 To col
If IsEmpty(ActiveCell) = True Then
'Copy cell value and paste into cell with missing value
Sheets("Part1DATA").Select
Range("j" & "4").Select
Selection.Copy
Sheets("TrainingData").Select
Range("j" & "i").Select
Selection.PasteSpecial Paste:=xlPasteValues, Operation:=xlNone, SkipBlanks _
:=False, Transpose:=False
'Make the cell Red
With Selection.Interior
.Pattern = xlSolid
.PatternColorIndex = xlAutomatic
.Color = 8420607
.TintAndShade = 0
.PatternTintAndShade = 0
End With
End If
'Move one row down
ActiveCell.Offset(1, 0).Select
Next j
'Move up to the top of the table and across
ActiveCell.Offset(col, 1).Select
Next i
It throws error 400. However, I have the same code (MissingValues), which colours it all in red, as shown on the file submitted.
The code will later be used to more or less the same extent, except for, it will search for outliers (based on outlier limit) and colouring in the outliers in yellow (hence the name of the sub)
Much appreciate any help!
Kurifodo
Hi,
I'm having a problem with my boxplots and they don't seem to show the correct position of the box itself in regard to outliers and Q1 & Q3 etc. Also I would like to narrow them down a bit if possible.
If anyone would also like to critique my other charts ie have I done them properly I would be very grateful to know.
Cheers
John
A lot of times I will have a table of financial ratios wherein there is a median value calculated as the last column. We will sometimes exclude values in the table if they are outliers. Our convention is to bold those items and then exclude that cell from the formula.
What I would like to be able to do is select the median formulas and then run a macro that would take that selection, look at the formulas in the selection and then select all of the cells that the formulas take into consideration when calculating the median values.
I would then be able to very quickly see that all of the formulas were updated properly. You can go down the list of median values manually, but it is cumbersome with a long list.
The code I have now is below. What I am struggling with is a way to determine the range the median formulas reference and then how to select that total range.
Any help with this would be appreciated. Maybe there is an easier way to accomplish the same thing.
Sub ShowRanges()
Dim rngCell As Range
Dim rngFormulaCellsOnly As Range,
Dim rngResults As Range
' Select only cells with formulas.
Set rngFormulaCellsOnly = Selection.SpecialCells(xlCellTypeFormulas)
' Loop through all the currently selected cells that contain formulas.
For Each rngCell In rngFormulaCellsOnly
' Look at formual in current cell and add to rngResults
' This is the part I am struggling with.
Next rngCell
' Select the cells
rngResults.Select
End Sub

