Mystic Statistics Logistics: 2007

Wednesday, May 9, 2007

ANCOVA won me ova'

The primary analysis I used for my fire ant analysis was an ANCOVA. Apparently, this is the standard method of teasing apart background effects for studies like mine where there is one treatment field and one field observed to account for seasonal variation. I found it a little frustrating because, in order to do an ANCOVA, you must first show that the response variable changes in response to the covariate and that there is a significant correlation between the covariate and independent variable. Since I did multiple ANCOVAs on different species of ants and some fit the model and some didn't, I was a bit nervous that I would have to go back to square one and find a new model to analyze the data. However, after some consultation, it turns out that as long as the model is appropriate for some of the other species and the species in question shows some trends towards at least one of the statistical tests for appropriateness of the ANCOVA, then it is ok to use the model. PHEW!

ANCOVA

Multiple players
Unstack the deck
Now deal

~from "The Tao of Statistics" by Dana K. Keller

Tuesday, May 1, 2007

My learning curve...

While working on my final project this evening, I had a realization. Not only was I having a "friendly" interaction with R, but I actually understood what it was telling me. Working with this program has been a strange conversational evolution. In the beginning, I didn't know how to talk to it and when I haphazardly managed a response from it, I didn't understand what it was telling me. Now, I've learned how to speak the R language (admittedly not fluently!) and the statistical language it spits back at me is making more and more sense everyday.

I also glanced back at some of the preliminary analyses I attempted on my data set before taking this class and it sent a shiver up my spine. It's amazing how far I've come from those frustrated fumblings with JMP 5.1 and SigmaPlot.

It's a good feeling to think that all of those long nights with Verzani, Gotelli, Ellison and R are paying off.

the Wonderful Wiki...

This blog is dedicated to the Wiki.

It has been such a wonderful supplement to the texts and my normal class notes. For someone just learning biostatistics, sometimes the class seems a bit fast paced and I have trouble catching everything. Several times I have found myself needing to solve a problem whose solution I only vaguely remember from something we discussed or an exercise we did in class. Most of the time, I am able to browse the Wiki and find either the answer or a good hint to point me in the right direction.

I appreciate everyone's effort in keeping it up-to-date this semester. It has been a very valuable resource.

Wednesday, April 11, 2007

Cocktail-Making-Contests & Statistical Analysis

On Friday night I attended a surprise, birthday, cocktail-making-contest party for a girl in my lab named Genoveva. The idea was that party goers would compete to whip up the drink that the birthday girl liked the best. Everyone brought the fixins for their concoctions and mayhem ensued...

Later in the evening after Genoveva had tried all of the drinks, I noticed two people (both PhDs) huddled over a sheet of paper, scribbling columns of numbers and names of drinks onto a sheet of paper. When I asked them what they were doing, they said, "We're figuring out the results of the cocktail-making-contest. " Amused by their dedication to this analysis, I listened in on their calculations for a while.

They had one column of values for Genoveva's "score" of the drink on a scale of 1-10. The second column was a percentage of how much of the drink she had actually consumed. I couldn't hear everything they were saying, but I gathered that they were using the percent consumed as a covariate with drink score to determine who the true winner of the contest was, because, presumably, if she did not finish all of the drink, then she couldn't have really like it that much. I noticed that they did not consider confounding factors, such as how many drinks she'd had before she tried each drink, presentation, how many drinks she was being offerred at once, whethor or not she cleansed her palette with water between drinks, etc. However, as far as cocktail-making-contest analysis goes, I'm willing to bet that this was one of the most advanced calculations ever done.

I leaned over to the person next to me and whispered, "Who invited these nerds?".

Monday, April 9, 2007

ANOVA and my circle

Heidi's presentation last week on ANOVA was my first real introduction to a statistical analysis that I probably should have familiarized myself with a long time ago. Finally having these concepts explained to me in plain English (well, as close as you can get with this stuff, I suppose) made me realize that for years I've heard and read this term over and over without really understanding what it meant. This got me thinking about something a professor once told me: Everything we know can be represented by the inside area of a circle, everthing we don't know would be the infinite area outside of the circle and everything that we realize we don't know would be the circumference of the circle.

Therefore, when we know little, we often think we know nearly everything. The more we learn, the more we realize we don't know. Only those who know a lot realize how little they really know.

Wednesday, March 28, 2007

DDIG

Well, I finally turned in my NSF dissertation improvement grant that I had to write for my Processes of Science in Ecology and Evolutionary Biology class! It was due at 5 pm today and I hit send on the e-mail just seconds before 5:01! Phew!

It is very uncustomary for a first semester grad student to have to write a grant like this, but I suppose it was a growing experience and I've learned a lot about the process.

One thing I realized is that I have a long way to go before I will be able to determine what kinds of statistical analyses to apply to my research. When I presented the material on experimental designs and statistical analyses, it seemed like such a neat little formula... all you have to know is if your independent variable and dependent variable are continuous or categorical and VOILA! you have the perfect recipe for your statistical analysis!

But it's not always that simple when you are trying to pick up a legacy project and work with a design that has been psuedoreplicated and has many uncontrollable confounding factors!

I still have so much to learn...

Thursday, March 8, 2007

Posting Graphs From "R" to Your Blog For Dummies

Hello Everyone!

For those of you itching to post your graphs from "R" onto your blogs, here are the steps I took to do it (Be aware that there may be a more efficient way, but this is what has worked so far for me):

1) Create your graph in "R"
2) Right click the graph and select "copy as bitmap"
3) In the "Start" menu, select "Programs" and then "Accessories". Click on the "Paint" program.
4) In Paint, click on the "select" tool. It will look like a box made out of dotted lines (probably in the upper left hand corner of the sceen).
5) Right click in the center of the blank part of the screen and click paste.

At this point, your graph should appear on the screen. You can use the text and drawing tools in Paint to add more detail or explaination to your figures.

6) Save the file as a bitmap on your computer.
7) Sign in to your blog.
8) Click "View Blog"
9) In the top right corner, click on "Customize" (the location of this may vary depending on what layout you selected for your blog)
10) On the new page, click "Add a Page Element"
11) You will have several options to choose from. Select the "Add to Blog" option under the space designated to adding pictures.
12) Browse your files to find your graph the same way you would add an attachment to an e-mail. You also have the option of adding a title and comment on the picture.
13) Click "save" and "preview".

Voila! Your graph should be posted! :)

Wednesday, February 28, 2007

Eureka!

Ok, so I should be sound asleep by now, but instead I've been sitting here plugging away at Verzani and trying to make heads or tails of my homework problems. Here's a summary of the conversation I was having with "R":

me > do this
R: NO! Error could not find function
me> ok, try this instead?
R: NO! Syntax Error
me> Ummmmm... let's be reasonable. How about this?
R: NO! Warning message
me> Puhleeeeeaaaaaasssssee do this? I'll be your friend.
R: ABSOLUTELY NOT! ERROR! ERROR! ERROR!

I have pages like this...

Anyway, I was getting the numbers associated with the Normal(x,y) distribution abbreviation mixed up with max and min. Just before I was about to throw in the towel (resolving to deprive myself of more sleep and work on this early in the morning), I realized that the numbers in the brackets are actually the mean and standard deviation (how did I miss that!?!). After having this epiphany, I was able to finish my homework problems in 10-15 minutes!

It's so weird how I'll sit and stare at the same scribbles on the same pages for hours and miss some of the most important points!

Ok, time for me to hit the hay! See you all in the morning...

Tuesday, February 27, 2007

Coley et al 2006

Yesterday for Journal Club we read a paper by Coley et al (2006) for Journal Club titled "The effects of plant quality on caterpillar growth and defense against natural enemies". Although I think we all enjoyed the read, most of our discussion revolved around aspects of the paper that we found problematic. Among the problems we discussed, was one that made me think of what we are and will be learning in Biostats and Experimental Design.

The researchers had opportunistically captured 85 species of caterpillars on 40 different species of plants. The researchers also noted weather the leaf they found the caterpillar on was young or old. The caterpillars were then reared on the same species of plant and the same age of leaf that they had been found on for the remainder of the experiment in order to assess growth rates.

One problem that was brought to my attention by a faculty member, who had reviewed the paper before it went for publication, was that the researchers used an ANOVA to evaluate the caterpillar growth rate data. She explained that an ANOVA was inappropriate because the caterpillars were never randomly assigned treatments. Instead, they were reared only on the plant and leaf age they were found upon, without ever testing to find out how the various species would develop on other, random plant assignments.

I look forward to learning more about how to apply ANOVAs and to the day when I might actually understand how to apply them to my own studies. It is a bit intimidating to know that even very experienced researchers sometimes aren't sure which statistical analyses are best fit for their data!

References:

P. D. Coley, M. L. Bateman, T. A. Kursar (2006) The effects of plant quality on caterpillar growth and defense against natural enemies Oikos 115 (2), 219–228.

Thursday, February 15, 2007

New Bling...

I just thought I'd explain the new pictures I've posted. I was very excited when I figured out that I could display pictures, so I dug up a few.

The graph is the result of the problem I tackled from Verzani chapter 3. It is a scatterplot of the correlation between carat size and price (in Singapore dollars) of 48 diamond rings. As you would guess, the correlation is extremely high. I think it was close to 1 when I asked "R" to calculate it for me. When I get a chance, I will try and post my cool new histogram and barplots from Verzani ch 4.

The funny little kid in the hood and great big glasses is my nephew at 8 or 9 years old. He's wearing a pair of glasses I bought for him that are supposed to give you insect vision! If you haven't tried them, the lenses are made up of several prisms, so that when you put them on your view becomes hundreds of tiny pictures instead of one large frame. This is supposed to mimic the multi-faceted compound eye of some insects, such as flies. However, I've recently read that this is a misconception because the animal's brain is able to integrate the many views into one single image. I'll have to tell my nephew before he spreads false information to any more 5th graders.

The last two images relate to the data set that I am planning to use for my independent project. My data set was compiled from pitfall trap collections taken in a grassland in South-Central Texas before and after a prescribed burn. Collections were also taken at a near by control field. The pictures show the manipulated field before and after the burn event. You can see that approximately 85-90% of the living herbaceous matter has been removed by the fire or replaced with standing dead biomass. It will be interesting to see the effects this will have on the resident insect communities. In particular, I would like to focus on the dominant native and invasive ant populations in the field. Some key points are driving my curiosity here:
1) Invasive species are generally positively affected by habitat disturbance.
2) Although native and invasive ants in this field nest below ground, the invasive ants nest much deeper than the natives. Therefore, the invasive species would have an increased ability to retreat away from fatally high temperatures possibly produced by the fire.
3) One of the native ant species is strictly granivorous. The other dominant native ant species is omnivorous, while the invasive species is a highly aggressive, omnivorous scavenger. The variability in diet demands of the dominant ant species in the field make me wonder how the abrupt loss of vegetation (and supposedly a significant portion of the seed bank, according to the grassland specialists from the Lady Bird Johnson Wildflower Center) will immediately affect these populations.

Thursday, February 1, 2007

Vascular and Photosynthetic Plant Structure Diagrams

Stem and Leaf plots were giving me a headache today in class. I'm still not quite sure how the plot for the "island" problem works, but here's an explaination I found online that helped me grasp the concept a little better.

What Are They Used For?They are usually used when there are large amounts of numbers to analyze. Series of scores on sports teams, series of temperatures or rainfall over a period of time, series of classroom test scores are examples of when Stem and Leaf Plots could be used.
What Does a Stem and Leaf Look Like?Example:
Test Scores Out Of 100
Stem Leaf
9 2 2 6 8
8 3 5
7 2 4 6 8 8 9
6 1 4 4 7 8
5 0 0 2 8 8
What does this Stem and Leaf Plot Show?The Stem shows the 'tens' and the leaf. At a glance, one can see that 4 students got a mark in the 90's on their test out of 100. Two students received the same mark of 92. No marks were received below 50. No mark of 100 was received. When you count the total amount of leaves, you know how many students took the test. The information is nicely organized when a Stem and Leaf Plot is used. Stem and Leaf Plots provide an 'at a glance' tool for specific information in large sets of data, otherwise one would have a long list of marks to sift through and analyze.

I also posted this on the notes for JV Ch 2, but I thought that more people might benefit from it if I posted it here as well.

Rebecca

Tuesday, January 23, 2007

Gotelli & Ellison

MY BOOK FINALLY CAME!

Hopefully I'll be able to start making meaningful contributions in class now! :)

Sunday, January 21, 2007

Brainstorm for Independent Project

For my Independent Project I have two ideas at the present moment.

1) While working at Trinity University, I worked on a study of the effects of a prescribed burn on the invasive red imported fire ant. I have some preliminary data from directly before and after the burn that needs to be analyzed. The data consists of pitfall trap counts of native and non-native ants and counts of other insects (to order only).

2) I am looking into a local spider diversity project. I'm not sure how much data would be available for analysis at this point. I should find out the answer to that question this week.

More to come on this...

Rebecca

Tuesday, January 16, 2007

logistikos

Today was the first day of Biostatistics and Experimental Design with Mike Guill. Part of our grade will be based on maintaining a blog about our experiences and thoughts on course material.

I named my blog site Mystic Statistics Logistics. Mystic conjures up images of wonder and things unknown while logistics comes from the Greek term logistikos, meaning "skilled in calculating" (which I am not, but hope to be after this course).

Thanks for stopping by my blog pages!

Cheers!

Mystic Statistics Logistics