<- c(12, 15, 18, 25, 30, 35)
x
<- max(x) - min(x)
range_value range_value
[1] 23
Variability is most commonly measured with the following descriptive statistics:
Range: the difference between the highest and lowest values.
Interquartile range: the range of the middle half of a distribution.
Standard deviation: average distance from the mean.
Variance: average of squared distances from the mean.
<- c(12, 15, 18, 25, 30, 35)
x
<- max(x) - min(x)
range_value range_value
[1] 23
Using range() function
<- c(12, 15, 18, 25, 30, 35)
x range(x)
[1] 12 35
Using range() function with NA
<- c(10, 15, NA, 25, 30, NA, 40)
x
range(x, na.rm = TRUE)
[1] 10 40
Range in DataFrame
<- data.frame(
df A = c(10, 15, 20, 25, 30),
B = c(5, 7, 9, 12, 15),
C = c(100, 120, 110, 130, 125),
Gender = c("m", "f", "f", "m", "f")
)
df
A B C Gender
1 10 5 100 m
2 15 7 120 f
3 20 9 110 f
4 25 12 130 m
5 30 15 125 f
Using: sapply
Numeric column
<- sapply(df[, sapply(df, is.numeric)], function(x) max(x) - min(x))
range_df range_df
A B C
20 10 30
aggregate()
Categorical variables
aggregate(cbind(A, B, C) ~ Gender, data = df,
FUN = function(x) max(x) - min(x))
Gender A B C
1 f 15 8 15
2 m 15 7 30
Range in DataFrame with NA
<- data.frame(
df A = c(10, 15, 20, 25, 30),
B = c(5, NA, 9, 12, 15),
C = c(100, 120, 110, 130, NA),
Gender = c("m", "f", "f", "m", "f")
)
df
A B C Gender
1 10 5 100 m
2 15 NA 120 f
3 20 9 110 f
4 25 12 130 m
5 30 15 NA f
In R, to handle missing data we need to put na.rm = TRUE
inside the function:
aggregate(cbind(A, B, C) ~ Gender, data = df,
FUN = function(x) max(x, na.rm = TRUE) - min(x, na.rm = TRUE))
Gender A B C
1 f 0 0 0
2 m 15 7 30