Module III: Statistical Techniques I – Complete Formulas with Clear Explanation

1. Measures of Central Tendency

Measure Formula (Raw Data) Formula (Grouped Data) Remarks
Arithmetic Mean (\(\bar{x}\)) \(\bar{x} = \frac{\Sigma x_i}{n}\) \(\bar{x} = \frac{\Sigma f_i x_i}{\Sigma f_i}\) Most common average
Median For odd n: middle term after arranging
For even n: average of two middle terms
Median = \( l + \left(\frac{N/2 - C}{f}\right) \times h \) l = lower limit, h = class width, f = frequency, C = cumulative freq. before median class
Mode Value with highest frequency Mode = \( l + \frac{f_1 - f_0}{2f_1 - f_0 - f_2} \times h \) l = lower limit of modal class

2. Moments

  • Raw moment about origin:
    \(\mu'_r = \frac{\Sigma x_i^r}{n}\) (ungrouped)
    \(\mu'_r = \frac{\Sigma f_i x_i^r}{N}\) (grouped)
  • Central moment about mean (\(\mu_r\)):
    \(\mu_r = \frac{\Sigma (x_i - \bar{x})^r}{n}\) (ungrouped)
    \(\mu_r = \frac{\Sigma f_i (x_i - \bar{x})^r}{N}\) (grouped)

Important central moments:

  • \(\mu_1 = 0\) (always)
  • \(\mu_2 =\) Variance \(= \sigma^2\)
  • \(\mu_3 \to\) used for skewness
  • \(\mu_4 \to\) used for kurtosis

3. Moment Generating Function (M.G.F.)

Definition:

\[ M(t) = E(e^{tx}) = \Sigma e^{tx} p(x) \quad \text{(discrete)} \] \[ M(t) = \int e^{tx} f(x)\, dx \quad \text{(continuous)} \]

Properties:

  • \(M(0) = 1\)
  • Raw moments: \(\mu'_r = \frac{d^r M(t)}{dt^r} \bigg|_{t=0}\)
  • Central moments from C.G.F. = \(\ln M(t)\)

4. Skewness (Measure of Asymmetry)

  • Karl Pearson’s coefficient:
    \(\beta_1 = \frac{\mu_3^2}{\mu_2^3}\)
    \(\gamma_1 = \sqrt{\beta_1} = \frac{\mu_3}{\sigma^3}\) (range ≈ –3 to +3)
  • Bowley’s coefficient (quartile based):
    Skewness = \(\frac{Q_3 + Q_1 - 2 \text{Median}}{Q_3 - Q_1}\)

Interpretation:

  • \(\gamma_1 > 0 \to\) positively skewed (tail on right)
  • \(\gamma_1 < 0 \to\) negatively skewed
  • \(\gamma_1 = 0 \to\) symmetric

5. Kurtosis (Measure of Peakedness)

  • Coefficient:
    \(\beta_2 = \frac{\mu_4}{\mu_2^2}\)
    \(\gamma_2 = \beta_2 - 3\)

Interpretation:

  • \(\gamma_2 > 0 \to\) Leptokurtic (sharper than normal)
  • \(\gamma_2 < 0 \to\) Platykurtic (flatter)
  • \(\gamma_2 = 0 \to\) Mesokurtic (normal curve)

6. Curve Fitting – Method of Least Squares

Principle: Minimize \(\Sigma (y_i - Y_i)^2\) where \(Y_i\) = predicted value.

Curve Type Normal Equations Final Equation
Straight line: \(y = a + bx\) \(\Sigma y = na + b\Sigma x\)
\(\Sigma xy = a\Sigma x + b\Sigma x^2\)
\(b = \frac{n\Sigma xy - \Sigma x \Sigma y}{n\Sigma x^2 - (\Sigma x)^2}\)
\(a = \bar{y} - b\bar{x}\)
Parabola: \(y = a + bx + cx^2\) \(\Sigma y = na + b\Sigma x + c\Sigma x^2\)
\(\Sigma xy = a\Sigma x + b\Sigma x^2 + c\Sigma x^3\)
\(\Sigma x^2 y = a\Sigma x^2 + b\Sigma x^3 + c\Sigma x^4\)
Solve the three equations
Exponential: \(y = a e^{bx}\) Take ln: \(\ln y = \ln a + bx\)
Let \(Y = \ln y\), then fit \(Y = A + bx\)
Same as straight line on transformed data
Geometric: \(y = ax^b\) \(\ln y = \ln a + b \ln x\) Fit between \(\ln y\) and \(\ln x\)

7. Correlation Analysis

  • Karl Pearson’s coefficient of correlation (\(r\)):
    \[ r = \frac{\Sigma (x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\Sigma (x_i - \bar{x})^2 \Sigma (y_i - \bar{y})^2}} = \frac{n\Sigma xy - \Sigma x \Sigma y}{\sqrt{(n\Sigma x^2 - (\Sigma x)^2)(n\Sigma y^2 - (\Sigma y)^2)}} \]
  • Properties: \(-1 \leq r \leq 1\)
  • Rank Correlation (Spearman’s):
    \(\rho = 1 - \frac{6 \Sigma d_i^2}{n(n^2 - 1)}\)
    where \(d_i =\) Rank\(_x\) – Rank\(_y\)

8. Regression Analysis

Regression equation of y on x: \(y = a + b x\)

  • \(b = r \cdot \frac{\sigma_y}{\sigma_x} = \frac{n\Sigma xy - \Sigma x \Sigma y}{n\Sigma x^2 - (\Sigma x)^2}\)
  • \(a = \bar{y} - b \bar{x}\)

Important Properties:

  • \(b_{yx} \cdot b_{xy} = r^2\)
  • \(r = \sqrt{b_{yx} \cdot b_{xy}}\) (sign same as b’s)
  • Correlation coefficient is geometric mean of regression coefficients

Summary Table of Key Formulas

Concept Formula
Mean \(\bar{x} = \Sigma x / n\)
Variance \(\sigma^2 = \Sigma (x - \bar{x})^2 / n\)
Pearson’s r \(r = \frac{n\Sigma xy - \Sigma x\Sigma y}{\sqrt{(n\Sigma x^2-(\Sigma x)^2)(n\Sigma y^2-(\Sigma y)^2)}}\)
Rank Correlation \(\rho\) \(1 - \frac{6\Sigma d^2}{n(n^2-1)}\)
Regression slope (y on x) \(b = \frac{n\Sigma xy - \Sigma x\Sigma y}{n\Sigma x^2 - (\Sigma x)^2}\)
Skewness (\(\gamma_1\)) \(\mu_3 / \sigma^3\)
Kurtosis (\(\gamma_2\)) \(\mu_4 / \sigma^4 - 3\)
Relation between r and b’s \(r^2 = b_{yx} \cdot b_{xy}\)

These are all the standard formulas and concepts from Module III (Statistical Techniques I) as per most Indian university syllabi (Anna University, Mumbai University, etc.). Practice numerical problems using these exact formulas for best exam performance.