[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Graphs



Dear Linsey:

I've had several questions like yourss during the week and I'm beginning to understand what you-all don't understand. I'm broadcasting the answer to the list, to find out if I'm saying the same thing to all of you and also with the hope that I'll remember the answer in the future.

THe pictures you have don't do any good because they just help you see that you have data ill suited to a Poisson. We know that already, but we want to go through the motions as if it were suitable.

You need to get predicted values for some range of X and plot through the points. Its exactly the same exercise as logistic regression.

mymod <- glm(Y~X,family=poisson)

Suppose X ranges from 1 to 10

pred1 <- predict(mymod, newdata=data.frame(X=1:10), type="response")

plot(X,Y, type="n")
points(X,Y)
lines(1:10, pred1)

Why do this?

Describe OLS. The thing we use as a predicted value is the expected value of Y given X:

E(Y|X) = a + b X

and to account for variety in Y, we say

Y ~ Normal( a+b X, sigma^2)

Now, if you have a count variable, the default formula used in glm with the Poisson family has an expected ("mean") value

E(Y|X) = exp (a + b x)

Note if you log both sides you have

log( E(Y|X)) = a + b x

For some reason, the pioneers of these models were emphatic about thinking of the transformation in that way, as a fudge on the mean prediction. That is, they wanted to talk about the transformation from the "mean" back to the linear predictor. They call that a link function.

This is the curved "line" you see if you plot predictions. In glm, you can get that with glm(Y~X,family=poisson(link=log)) which is the default, so if you just do glm(Y~X,family=poisson) then you get the same thing.

But the distribution of Y is different, it is Poisson

Y ~ Poisson( exp(a+ b X) )

The use of the exp there translates the a+bX into the mean of Y. The Poisson has the property that its expected value equals its variance equals its "lambda" parameter, the one I called "input" on my handout. The exp() is used for a variety of reasons. One really big reason is that the value of (a + bX) has to be 0 or greater. Otherwise, Poisson distribution is not defined. (Poisson does not exist for negative numbers). You can find other transformations besides exp() that keep the value of (a+bX) positive, they are OK. You can even chance it and ignore the problem and run a Poisson with

E(Y|X) = a + b X.

If you want that, do glm(Y~X,family=poisson(link=identity))

So if you want to compare the Normal model against Poisson, you should transform the equation for the Normal model so that the expected value is the same. If you did OLS with this assumption

E(Y|X) = exp(a + b X)

Then the predicted values of the 2 models would match if their a and b were the same. You'd estimate that in lm by

mymod<-lm ( log(Y)~ X )

Get predicted values like always, except: THe predicted values from that would be in logged values, and you'd need to translate back into the Y scale with

predY <- exp(predicted_lm_values)



Then, if you get the predicted values out of the Poisson model with the predict function (don't forget to add the predict option type="response")

Then you make a plot with the raw data points, and overlay the 2 kinds of lines.

In the example that Nathan had, the 2 lines were quite close.

Of course, the distribution of points that is implied by a model depends on the distribution. SO with your eye, you can decide for yourself if the Normal or Poisson seems more "right". Normal would have points evenly divided on either side of the line, and OLS requires that homoskedasticity as well.

On the other hand, if the predicted value is small, then the distribution of Y in a Poisson is quite not normal.


In the days before VGA graphics, we had to do all artwork with letters, so this takes me back:

XX
XX
XX  XX
XX  XX
XX  XX  XX
XX  XX  XX  XX
XX  XX  XX  XX  XX


As the predicted value goes up, the distribution of Y predicted by Poisson looks more and more normal.


                 XX
               XXXXXX
             XXXXXXXXXXX
          XXXXXXXXXXXXXXXXXX

If your expected value is a big number, say 100, then whether it is Poisson or Normal is not very important. However, for a small predicted value, then there is a huge difference. Doesn't my beautiful ascii art show it???



I'm not feeling so well for the past few days, otherwise I'd offer to come to work and help you. But if you questions about how to make R do your work, you can feel free to ask me in ps707-l.



linseym wrote:
Prof. Johnson, I just tried emailing you this so disregard this message if you already got it, the computer said it was having problems with the webmail connection. So here goes, I ran the Poisson graphs but have no clue how to interpret them (remember my dep.var. is the aweful 4 ordinal categories) If you can please help.
Thanks, Linsey

I have attached the graphs for you to look at,
they look really cool.


--
Paul E. Johnson                       email: pauljohn_AT_ku.edu
Dept. of Political Science            http://lark.cc.ku.edu/~pauljohn
1541 Lilac Lane, Rm 504
University of Kansas                  Office: (785) 864-9086
Lawrence, Kansas 66044-3177           FAX: (785) 864-5700