[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Graphs
Dear Linsey:
I've had several questions like yourss during the week and I'm beginning
to understand what you-all don't understand. I'm broadcasting the answer
to the list, to find out if I'm saying the same thing to all of you and
also with the hope that I'll remember the answer in the future.
THe pictures you have don't do any good because they just help you see
that you have data ill suited to a Poisson. We know that already, but we
want to go through the motions as if it were suitable.
You need to get predicted values for some range of X and plot through
the points. Its exactly the same exercise as logistic regression.
mymod <- glm(Y~X,family=poisson)
Suppose X ranges from 1 to 10
pred1 <- predict(mymod, newdata=data.frame(X=1:10), type="response")
plot(X,Y, type="n")
points(X,Y)
lines(1:10, pred1)
Why do this?
Describe OLS. The thing we use as a predicted value is the expected
value of Y given X:
E(Y|X) = a + b X
and to account for variety in Y, we say
Y ~ Normal( a+b X, sigma^2)
Now, if you have a count variable, the default formula used in glm with
the Poisson family has an expected ("mean") value
E(Y|X) = exp (a + b x)
Note if you log both sides you have
log( E(Y|X)) = a + b x
For some reason, the pioneers of these models were emphatic about
thinking of the transformation in that way, as a fudge on the mean
prediction. That is, they wanted to talk about the transformation from
the "mean" back to the linear predictor. They call that a link function.
This is the curved "line" you see if you plot predictions. In glm, you
can get that with glm(Y~X,family=poisson(link=log)) which is the
default, so if you just do glm(Y~X,family=poisson) then you get the same
thing.
But the distribution of Y is different, it is Poisson
Y ~ Poisson( exp(a+ b X) )
The use of the exp there translates the a+bX into the mean of Y. The
Poisson has the property that its expected value equals its variance
equals its "lambda" parameter, the one I called "input" on my handout.
The exp() is used for a variety of reasons. One really big reason is
that the value of (a + bX) has to be 0 or greater. Otherwise, Poisson
distribution is not defined. (Poisson does not exist for negative
numbers). You can find other transformations besides exp() that keep
the value of (a+bX) positive, they are OK. You can even chance it and
ignore the problem and run a Poisson with
E(Y|X) = a + b X.
If you want that, do glm(Y~X,family=poisson(link=identity))
So if you want to compare the Normal model against Poisson, you should
transform the equation for the Normal model so that the expected value
is the same. If you did OLS with this assumption
E(Y|X) = exp(a + b X)
Then the predicted values of the 2 models would match if their a and b
were the same. You'd estimate that in lm by
mymod<-lm ( log(Y)~ X )
Get predicted values like always, except: THe predicted values from that
would be in logged values, and you'd need to translate back into the Y
scale with
predY <- exp(predicted_lm_values)
Then, if you get the predicted values out of the Poisson model with the
predict function (don't forget to add the predict option type="response")
Then you make a plot with the raw data points, and overlay the 2 kinds
of lines.
In the example that Nathan had, the 2 lines were quite close.
Of course, the distribution of points that is implied by a model depends
on the distribution. SO with your eye, you can decide for yourself if
the Normal or Poisson seems more "right". Normal would have points
evenly divided on either side of the line, and OLS requires that
homoskedasticity as well.
On the other hand, if the predicted value is small, then the
distribution of Y in a Poisson is quite not normal.
In the days before VGA graphics, we had to do all artwork with letters,
so this takes me back:
XX
XX
XX XX
XX XX
XX XX XX
XX XX XX XX
XX XX XX XX XX
As the predicted value goes up, the distribution of Y predicted by
Poisson looks more and more normal.
XX
XXXXXX
XXXXXXXXXXX
XXXXXXXXXXXXXXXXXX
If your expected value is a big number, say 100, then whether it is
Poisson or Normal is not very important. However, for a small predicted
value, then there is a huge difference. Doesn't my beautiful ascii art
show it???
I'm not feeling so well for the past few days, otherwise I'd offer to
come to work and help you. But if you questions about how to make R do
your work, you can feel free to ask me in ps707-l.
linseym wrote:
Prof. Johnson, I just tried emailing you this so disregard this message if you
already got it, the computer said it was having problems with the webmail
connection. So here goes, I ran the Poisson graphs but have no clue how to
interpret them (remember my dep.var. is the aweful 4 ordinal categories) If
you can please help.
Thanks, Linsey
I have attached the graphs for you to look at,
they look really cool.
--
Paul E. Johnson email: pauljohn_AT_ku.edu
Dept. of Political Science http://lark.cc.ku.edu/~pauljohn
1541 Lilac Lane, Rm 504
University of Kansas Office: (785) 864-9086
Lawrence, Kansas 66044-3177 FAX: (785) 864-5700