Module III: Statistical Techniques I – Complete Formulas with Clear Explanation
1. Measures of Central Tendency
| Measure | Formula (Raw Data) | Formula (Grouped Data) | Remarks |
|---|---|---|---|
| Arithmetic Mean (\(\bar{x}\)) | \(\bar{x} = \frac{\Sigma x_i}{n}\) | \(\bar{x} = \frac{\Sigma f_i x_i}{\Sigma f_i}\) | Most common average |
| Median | For odd n: middle term after arranging For even n: average of two middle terms |
Median = \( l + \left(\frac{N/2 - C}{f}\right) \times h \) | l = lower limit, h = class width, f = frequency, C = cumulative freq. before median class |
| Mode | Value with highest frequency | Mode = \( l + \frac{f_1 - f_0}{2f_1 - f_0 - f_2} \times h \) | l = lower limit of modal class |
2. Moments
- Raw moment about origin:
\(\mu'_r = \frac{\Sigma x_i^r}{n}\) (ungrouped)
\(\mu'_r = \frac{\Sigma f_i x_i^r}{N}\) (grouped) - Central moment about mean (\(\mu_r\)):
\(\mu_r = \frac{\Sigma (x_i - \bar{x})^r}{n}\) (ungrouped)
\(\mu_r = \frac{\Sigma f_i (x_i - \bar{x})^r}{N}\) (grouped)
Important central moments:
- \(\mu_1 = 0\) (always)
- \(\mu_2 =\) Variance \(= \sigma^2\)
- \(\mu_3 \to\) used for skewness
- \(\mu_4 \to\) used for kurtosis
3. Moment Generating Function (M.G.F.)
Definition:
\[ M(t) = E(e^{tx}) = \Sigma e^{tx} p(x) \quad \text{(discrete)} \] \[ M(t) = \int e^{tx} f(x)\, dx \quad \text{(continuous)} \]Properties:
- \(M(0) = 1\)
- Raw moments: \(\mu'_r = \frac{d^r M(t)}{dt^r} \bigg|_{t=0}\)
- Central moments from C.G.F. = \(\ln M(t)\)
4. Skewness (Measure of Asymmetry)
- Karl Pearson’s coefficient:
\(\beta_1 = \frac{\mu_3^2}{\mu_2^3}\)
\(\gamma_1 = \sqrt{\beta_1} = \frac{\mu_3}{\sigma^3}\) (range ≈ –3 to +3) - Bowley’s coefficient (quartile based):
Skewness = \(\frac{Q_3 + Q_1 - 2 \text{Median}}{Q_3 - Q_1}\)
Interpretation:
- \(\gamma_1 > 0 \to\) positively skewed (tail on right)
- \(\gamma_1 < 0 \to\) negatively skewed
- \(\gamma_1 = 0 \to\) symmetric
5. Kurtosis (Measure of Peakedness)
- Coefficient:
\(\beta_2 = \frac{\mu_4}{\mu_2^2}\)
\(\gamma_2 = \beta_2 - 3\)
Interpretation:
- \(\gamma_2 > 0 \to\) Leptokurtic (sharper than normal)
- \(\gamma_2 < 0 \to\) Platykurtic (flatter)
- \(\gamma_2 = 0 \to\) Mesokurtic (normal curve)
6. Curve Fitting – Method of Least Squares
Principle: Minimize \(\Sigma (y_i - Y_i)^2\) where \(Y_i\) = predicted value.
| Curve Type | Normal Equations | Final Equation |
|---|---|---|
| Straight line: \(y = a + bx\) | \(\Sigma y = na + b\Sigma x\) \(\Sigma xy = a\Sigma x + b\Sigma x^2\) |
\(b = \frac{n\Sigma xy - \Sigma x \Sigma y}{n\Sigma x^2 - (\Sigma x)^2}\) \(a = \bar{y} - b\bar{x}\) |
| Parabola: \(y = a + bx + cx^2\) | \(\Sigma y = na + b\Sigma x + c\Sigma x^2\) \(\Sigma xy = a\Sigma x + b\Sigma x^2 + c\Sigma x^3\) \(\Sigma x^2 y = a\Sigma x^2 + b\Sigma x^3 + c\Sigma x^4\) |
Solve the three equations |
| Exponential: \(y = a e^{bx}\) | Take ln: \(\ln y = \ln a + bx\) Let \(Y = \ln y\), then fit \(Y = A + bx\) |
Same as straight line on transformed data |
| Geometric: \(y = ax^b\) | \(\ln y = \ln a + b \ln x\) | Fit between \(\ln y\) and \(\ln x\) |
7. Correlation Analysis
- Karl Pearson’s coefficient of correlation (\(r\)):
\[ r = \frac{\Sigma (x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\Sigma (x_i - \bar{x})^2 \Sigma (y_i - \bar{y})^2}} = \frac{n\Sigma xy - \Sigma x \Sigma y}{\sqrt{(n\Sigma x^2 - (\Sigma x)^2)(n\Sigma y^2 - (\Sigma y)^2)}} \] - Properties: \(-1 \leq r \leq 1\)
- Rank Correlation (Spearman’s):
\(\rho = 1 - \frac{6 \Sigma d_i^2}{n(n^2 - 1)}\)
where \(d_i =\) Rank\(_x\) – Rank\(_y\)
8. Regression Analysis
Regression equation of y on x: \(y = a + b x\)
- \(b = r \cdot \frac{\sigma_y}{\sigma_x} = \frac{n\Sigma xy - \Sigma x \Sigma y}{n\Sigma x^2 - (\Sigma x)^2}\)
- \(a = \bar{y} - b \bar{x}\)
Important Properties:
- \(b_{yx} \cdot b_{xy} = r^2\)
- \(r = \sqrt{b_{yx} \cdot b_{xy}}\) (sign same as b’s)
- Correlation coefficient is geometric mean of regression coefficients
Summary Table of Key Formulas
| Concept | Formula |
|---|---|
| Mean | \(\bar{x} = \Sigma x / n\) |
| Variance | \(\sigma^2 = \Sigma (x - \bar{x})^2 / n\) |
| Pearson’s r | \(r = \frac{n\Sigma xy - \Sigma x\Sigma y}{\sqrt{(n\Sigma x^2-(\Sigma x)^2)(n\Sigma y^2-(\Sigma y)^2)}}\) |
| Rank Correlation \(\rho\) | \(1 - \frac{6\Sigma d^2}{n(n^2-1)}\) |
| Regression slope (y on x) | \(b = \frac{n\Sigma xy - \Sigma x\Sigma y}{n\Sigma x^2 - (\Sigma x)^2}\) |
| Skewness (\(\gamma_1\)) | \(\mu_3 / \sigma^3\) |
| Kurtosis (\(\gamma_2\)) | \(\mu_4 / \sigma^4 - 3\) |
| Relation between r and b’s | \(r^2 = b_{yx} \cdot b_{xy}\) |
These are all the standard formulas and concepts from Module III (Statistical Techniques I) as per most Indian university syllabi (Anna University, Mumbai University, etc.). Practice numerical problems using these exact formulas for best exam performance.