Quantcast
Viewing latest article 5
Browse Latest Browse All 18

Log-likelihood proof and AIC hypothesis

First of all, statistics is just not my thing … yet (I hope!)

I’m having a hard time finding out the log-likelihood equation:

Given $Y rightarrow mathcal{N}(mu_1,sigma_1)$ (observation) and $hat{Y} rightarrow mathcal{N}(mu_2,sigma_2)$ (prediction) because of our big sample, $approx 2500$ values), we have

$$Y+(-hat{Y}) rightarrow mathcal{N}(mu,sigma) quad text{where} quad mu = mu_1 – mu_2 = 0 quad text{and}quad sigma = sqrt{sigma_1^2 + sigma_2^2}$$

Therefore, the probability density function $f$ of our normal distribution can be writen as follows:

begin{equation}
f(y-hat{y}, |,0,sigma) = dfrac{1}{sigmasqrt{2pi}}
expleft(-frac{left(y-hat{y} right)^2}{2sigma^2}right)
end{equation}

The likelihood forumla is the following with $x_i=y_i-hat{y}_i$:
$$
begin{align}
mathcal{L}(theta, ; , x_1,…,x_n) &= prod_{i=1}^n f(x_i, |,theta)\
& = frac{1}{sigma^nleft(sqrt{2pi}right)^n}, exp left(-frac{1}{2sigma^2}sum_{i=1}^n x_i^2right)
end{align}
$$

The log-likelihood is therefore given by:
$$
lnleft(mathcal{L}left(theta, ; , x_1,…,x_nright)right) =
-frac{n}{2}left[lnleft(sigmaright)+lnleft(2piright)+
frac{1}{nsigma^2}sum_{i=1}^{n} x_i^2right]
$$

..

Standardizing data using $z_i = dfrac{y_i-mu_1^i}{sigma_1^i}$ and
$hat{z}_i = dfrac{hat{y}_i-mu_2^i}{sigma_2^i}$, we have:

$$
begin{align}
&Z – hat{Z} rightarrow mathcal{N}(0,1)\
&f(z-hat{z}, |, 0,1) = dfrac{1}{sqrt{2pi}}
expleft(-frac{left(z-hat{z}right)^2}{2}right)\
&mathcal{L}(theta, ; , x_1,…,x_n) = frac{1}{left(sqrt{2pi}right)^n}, exp left(-frac{1}{2}sum_{i=1}^n x_i^2right)
quad text{with $x_i = z_i – hat{z}_i$}\
&lnleft(mathcal{L}left(theta, ; , x_1,…,x_nright)right) =
-frac{n}{2}left[lnleft(2piright)+
frac{1}{n}sum_{i=1}^{n} x_i^2right]
end{align}
$$

What am I doing wrong? Because, when computing AIC, the only likelihood formula used is the one below:

$$
lnleft(mathcal{L}left(theta, ; , x_1,…,x_nright)right) =
-frac{n}{2}left[1 + lnleft(2piright)+
lnleft(frac{1}{n}sum_{i=1}^{n} x_i^2right)right]
$$

This forumla seems to be coming from the log-normal distribution, although I didn’t achieve to demonstrate it neighter.
I’m not asking for the entire proof, but just a hint on how to do it the right way.

..

My second question is:
When I’m saing ‘variables’ below, I’m refering to the independent ones. The dependant one is and will always remain the same accross the models.

Is it statistically or mathematically possible to do a multivariate linear regression using linear AND non linear variables if those non linear variables seem to improve the model?
If the answer is YES, can the AIC be used on a model combining linear and non linear variables?

Example:
$Y$ is the dependent variable and $A_1,A_2,A_3,A_4$ are the independent variables.

The model to be found will be of the form: $hat{Y} = beta_0 + sum_{i=1}^{4} beta_i A_i$

Now, plotting Y(A_i) is giving the following results:

$Y$ vs $A_1$ or $A_2$ seems linear.

$Y$ vs $A_3$ or $A_4$ doesn’t seem to be linear

And the models to be tested are as follows:

$mathcal{M}_1$ includes $A_1$

$mathcal{M}_2$ includes $A_1$ and $A_2$

$mathcal{M}_3$ includes $A_1$ and $A_3$

$mathcal{M}_4$ includes $A_1$ and $A_2$ and $A_4$

$mathcal{M}_5$ includes $A_1$ and $A_2$ and $A_3$ and $A_4$

Can the AIC be used to compare those 5 models? (I know that AIC can be used on non-nested models).

Last question: Can somebody enumerate the hypothesis that must be checked in order to use the AIC? I’ve read a lot about the AIC but I can’t make up my mind about what is mandatory and what is not.
The hypothesis I’ve seen until now are the following:

1) The most complete model (containing all the variable that we want to test) must be tested against the dependent variable values. If $chi^2$ test is not significant at the $95^th$%, we cannot use the AIC.

2) Each independent variables must be tested against the dependent variable, alone. Only the one seeming to be linear must be retained.

3) Predictive variables must be independent and normally distributed.

Thank you very very much for you answers.


Viewing latest article 5
Browse Latest Browse All 18

Trending Articles