Call/WhatsApp: +1 914 416 5343

## BIOSTATISTICS

BIOSTATISTICS
PROJECT BACKGROUND: WILL ROGERS’ PHENOMENON

Data in general, and categorical data (or continuous data that are grouped and then analyzed as categorical) in particular, can be misleading; the setup looks simple,
but the reality is not so simple. In particular, there are often other factors that must be considered in order to interpret the data correctly. One famous example
of this deceptiveness is the “Will Rogers’ Phenomenon” also called “Stage Migration”. This involves the situation where subjects in a study are grouped (or
“stratified”) incorrectly. The “Will Rogers’ Phenomenon” is described briefly below with text extracted from Wikipedia.

The Will Rogers’ phenomenon occurs when a subject in one group is moved to another group, and the average value of some measurement goes up in both groups. It is
based on the following quote, attributed to comedian (and transplant to California) Will Rogers: “When the Okies left Oklahoma and moved to California, they raised
the average intelligence level in both states.”

The Will Rogers phenomenon will occur when two conditions are met: (1) the individual being moved is below average in his/her current group (removing this individual,
will, by definition, raise the average of the remaining individuals in that group, and (b) the individual being moved is above the current average of the group in
which he/she is entering (adding the individual to the new group will, by definition, raise the average).

Numerical example: consider the sets numbers R and S where R={1, 2, 3, 4} and S={5, 6, 7, 8, 9}. The average of R is 2.5, and the arithmetic mean of S is 7. If 5
(the lowest value in S) is moved from S to R, producing R={1, 2, 3, 4, 5} and S={6, 7, 8, 9}, then the average of R increases to 3 and the average of S increases to
7.5.
One real-world example of the Will Rogers’ phenomenon is seen in the medical concept of stage migration. In medical stage migration, improved detection of illness
leads to the movement of people from the set of healthy people to the set of unhealthy people. Because these people are not healthy, removing them from the set of
healthy people increases the average lifespan of the healthy group. Likewise, the migrated people are healthier than the people already in the unhealthy set, so adding
them raises the average lifespan of that group as well. Both lifespans are statistically lengthened, even if early detection of a cancer does not lead to better
treatment: because it is detected earlier, more time is lived in the “unhealthy” set of people. This phenomenon is described in the article by Feinstein, Sosin and
Wells.

Feinstein AR, Sosin DM, Wells CK (June 1985). “The Will Rogers phenomenon. Stage migration and new diagnostic techniques as a source of misleading statistics for
survival in cancer”. The New England Journal of Medicine. 312:1604–8. doi:10.1056/NEJM198506203122504. PMID 4000199.

Please read the Feinstein article in the New England Journal. This study in this article was initially designed to see if treatment for lung cancer had improved over
time, as reflected in an improvement in survival – comparing two cohorts of patients: those who started treatment during the years 1953-1964 to those who started
treatment much later, during 1977. Only by thoughtful analysis, were the authors able to understand the patterns correctly.

The questions in this assignment will focus on Tables 3 and 4 of this article. The data that are summarized in Tables 3 and 4 are found in the file Will_Rogers.sav.

– The data in the 2nd column correspond to the staging that was used in Table 3. The patients in the 1953-1964 cohort were staged using methods available during
that time and the patients in the 1977 cohort were staged used more sensitive methods available in 1977 (but not available in 1953-1964)

– The data in the 3rd column correspond to the data in Table 4 in which the patients in the 1977 cohort were reclassified (i.e. restaged) using only staging
methods available during 1953-1964. So in the 3rd column, all patients were staged using the same methods.

Please focus on the left-hand side of Table 3. The numbers in the top row of the 1st column of data have the following interpretation: There were 281 patients with
lung cancer in the 1953-1964 cohort and of these 281 patients, 211 (75%) were still alive 6 months after start of 1st treatment (or when a decision was made not to
treat). The rest of the numbers can be interpreted in a similar fashion.

(1) Use the data in Table 3 to see if the 6-month survival for patients with lung cancer is different in 1977 compared to 1953-1964.

As described, is this endpoint variable, the 6-month survival, (a) categorical/qualitative or (b) numeric/quantitative?_______

More specifically, which best describes the endpoint variable? (a) dichotomous, (b) continuous, or (c) initially continuous but analyzed as a dichotomous variable?
_________

(2) First, in order to evaluate whether the 6-month survival in 1977 is different from the 6-month survival in 1953-1964, please test to see whether there is an
association between cohorts (1977 vs. 1953-1964) and 6-month survival. Test using α=0.05. Do this ignoring the TNM Stage.

State the null and alternative hypotheses.

Ho:__________________________________________________________________________________

Ha:__________________________________________________________________________________

What test will you use for each of this comparison?___________________________________________

Is this a one-sided or a two-sided test?________________________________________________________

_______________________________________________________________________________________

_______________________________________________________________________________________

_______________________________________________________________________________________

What statistic will you use to estimate the magnitude of the association?______________________________

Use SPSS to calculate the test for association and to calculate a measure of association with its 95% confidence interval. Paste a screen shot of the results of your
work at the end of your document.
Complete this Table

6-Month Survival for all Patients Lung Cancer (based on Staging in Table 3)

Cohort

Number of Patients

Alive at Least 6 Months

Total

1977

131

1953-1964

1,266

Total

Value of the test statistic:______________________ (degrees of freedom, if appropriate:___________)

p-value:_____________________________

What is the value of the measure of association and its 95% confidence interval?_________________________________________________________________________

In words, interpret the measure of association:_________________________________________________

_______________________________________________________________________________________

_______________________________________________________________________________________

Based on the test and the measure of association, in words, state your conclusions:

______________________________________________________________________________________

_______________________________________________________________________________________

_______________________________________________________________________________________

_______________________________________________________________________________________

(3) Now examine whether the 6-month survival in 1977 is different from the 6-month survival in 1953-1964, for each of the 3 TNM Stages separately. To do this, please
test to see whether there is an association between cohorts (1977 vs. 1953-1964) and 6-month survival. Test using α=0.05. Do this for each TNM Stage separately.

TNM Stage I Results:

Use SPSS to calculate the test for association and to calculate a measure of association with its 95% confidence interval. Paste a screen shot of the results of your
work at the end of your document. Note that a single screen shot may contain results for several parts (no need to paste duplicates)
Complete this Table

6-Month Survival for Patients with TNM Stage I Lung Cancer (based on Staging in Table 3)

Cohort

Number of Patients

Alive at Least 6 Months

Total

1977

1953-1964

Total

TNM Stage II Results:

Summary of Analyses

– Based on Table 3

Stage

I

Stage II

Stage III

Test Statistic

p-value

Measure of Association

95% Confidence Interval
Complete this Table

6-Month Survival for Patients with TNM Stage II Lung Cancer (based on Staging in Table 3)

Cohort

Number of Patients

Alive at Least 6 Months

Total

1977

1953-1964

Total

TNM Stage III Results:

Complete this Table

6-Month Survival for Patients with TNM Stage III Lung Cancer (based on Staging in Table 3)

Cohort

Number of Patients

Alive at Least 6 Months

Total

1977

1953-1964

Total

(4) To further evaluate whether the 6-month survival in 1977 is different from the 6-month survival in 1953-1964, please test to see whether there is an association
between cohorts (1977 vs. 1953-1964) and 6-month survival overall, controlling for (i.e. “stratifying by”) TNM Stage. Again, test using α=0.05.

State the null and alternative hypotheses.

Ho:__________________________________________________________________________________

Ha:__________________________________________________________________________________

What test will you use for this comparison?___________________________________________

Is this a one-sided or a two-sided test?________________________________________________________

What assumptions do you need to verify in order to justify your choice of test:________________________

_______________________________________________________________________________________

_______________________________________________________________________________________

_______________________________________________________________________________________

What statistic will you use to estimate the magnitude of the association?______________________________

Use SPSS to calculate the test for association and to calculate a measure of association with its 95% confidence interval. Paste a screen shot of the results of your
work at the end of your document or indicate where you have obtained the values if you have already included the screen shot

Value of the test statistic:______________________ (degrees of freedom, if appropriate:___________)

p-value:_____________________________

What is the value of the measure of association and its 95% confidence interval?______________________

_______________________________________________________________________________________

Based on this test and the measure of association, in words, state your conclusions regarding the association between cohort and 6-month survival:

______________________________________________________________________________________

_______________________________________________________________________________________

_______________________________________________________________________________________

_______________________________________________________________________________________

(5) Complete the 2×3 table that summarizes the numbers of patients by cohort and TNM Stage (based on Table 3):

TNM Stage of Patients with Lung Cancer (based on Staging in Table 3)

Cohort

Number of Patients

TNM Stage I

TNM Stage II

TNM Stage III

Total

1977

1953-1964

Total

Please test whether there is an association between Cohort and the TNM Stage of Lung Cancer (based on Table 3), using an α=0.05 level of significance.

State the null and alternative hypotheses.

Ho:__________________________________________________________________________________

Ha:__________________________________________________________________________________

What test will you use for this comparison?___________________________________________

What degrees of freedom will this test statistic have?___________________________________

Is this a one-sided or a two-sided test?________________________________________________________

What assumptions do you need to verify in order to justify your choice of test and are the assumptions fulfilled?

______________________________________________________________________________________

_______________________________________________________________________________________

_______________________________________________________________________________________

Use SPSS to calculate the test for association. Paste a screen shot of the results of your work at the end of your document.

Value of the test statistic:______________________ p-value:_____________________________

In words, state your conclusions regarding the association between cohort and TNM stage of lung cancer:

______________________________________________________________________________________

_______________________________________________________________________________________

_______________________________________________________________________________________

_______________________________________________________________________________________

(6) Based on Parts (2), (3), (4), and (5), state your conclusions regarding the association between Cohort and 6-month survival for patients with lung cancer.
Explain how you arrived at your conclusions.

______________________________________________________________________________________

_______________________________________________________________________________________

_______________________________________________________________________________________

_______________________________________________________________________________________

______________________________________________________________________________________

_______________________________________________________________________________________

_______________________________________________________________________________________

(6) Based on the numbers in the 1st column of Table 4, repeat the analyses above. Test to see whether there is an association between cohorts (1977 vs. 1953-1964) and
6-month survival. Test using α=0.05. Do this for each TNM Stage separately. Remember that the difference between Tables 3 and 4, is that in Table 3, the Staging for
the 1977 Cohort is based on the more sensitive imaging methods available in 1977, while in Table 4, the Staging for the 1977 Cohort was redone and was based on the
less sensitive imaging methods that were used for the 1953-1964 cohort.

TNM Stage I Results based on the TNM Staging Given in Table 4:

Use SPSS to calculate the tests for association and to calculate a measure of association with its 95% confidence interval. Paste a screen shot of the results of your
work at the end of your document.
Complete this Table

6-Month Survival for Patients with TNM Stage I Lung Cancer (based on Staging in Table 4)

Cohort

Number of Patients

Alive at Least 6 Months

Total

1977

1953-1964

Total

Summary of Analyses

– Based on Table 4

Stage

I

Stage II

Stage III

Test Statistic

p-value

Measure of Association

95% Confidence Interval

TNM Stage II Results:

Complete this Table

6-Month Survival for Patients with TNM Stage II Lung Cancer (based on Staging in Table 4)

Cohort

Number of Patients

Alive at Least 6 Months

Total

1977

1953-1964

Total

TNM Stage III Results:

Complete this Table

6-Month Survival for Patients with TNM Stage III Lung Cancer (based on Staging in Table 4)

Cohort

Number of Patients

Alive at Least 6 Months

Total

1977

1953-1964

Total

(7) To further evaluate, based on Table 4 whether the 6-month survival in 1977 is different from the 6-month survival in 1953-1964, please test to see whether there
is an association between cohorts (1977 vs. 1953-1964) and 6-month survival overall, controlling for (i.e. “stratifying by”) TNM Stage. Again, test using α=0.05.

As before, use SPSS to calculate the tests for an association and to calculate a measure of association with its 95% confidence interval. Paste a screen shot of the

Value of the test statistic:______________________ p-value:_____________________________

What is the value of the measure of association and its 95% confidence interval?______________________

_______________________________________________________________________________________

Based on this test and the measure of association, in words, state your conclusions regarding the association between cohort and 6-month survival:

______________________________________________________________________________________

_______________________________________________________________________________________

_______________________________________________________________________________________

_______________________________________________________________________________________

(8) Complete the 2×3 table that summarizes the numbers of patients by cohort and TNM Stage (based on Table 4):

TNM Stage of Patients with Lung Cancer (based on Staging in Table 3)

Cohort

Number of Patients

TNM Stage I

TNM Stage II

TNM Stage III

Total

1977

1953-1964

Total

Please test whether there is an association between Cohort and the TNM Stage of Lung Cancer (based on Table 4), using an α=0.05 level of significance.

Value of the test statistic:______________________ p-value:_____________________________

In words, state your conclusions regarding the association between cohort and TNM stage of lung cancer:

______________________________________________________________________________________

_______________________________________________________________________________________

_______________________________________________________________________________________

(9) Based on Parts (6), (7), (8), state your conclusions regarding the association between Cohort and 6-month survival for patients with lung cancer. Explain how you

______________________________________________________________________________________

_______________________________________________________________________________________

_______________________________________________________________________________________

_______________________________________________________________________________________

______________________________________________________________________________________

_______________________________________________________________________________________

_______________________________________________________________________________________

_______________________________________________________________________________________

(10) Transfer the results from questions (2), (4) and (7) to the Table below:

Summary of Analyses

– Based on Table 3

Unstratified Test

(Question #2)

Stratified Test

(Question #4 – Table 3)

Stratified Test

(Question #7 – Table 4)

Test Statistic

p-value

Measure of Association

95% Confidence Interval

The assessments of association between Cohorts and 6-month survival are different when using Table 3 staging compared to Table 4 staging. Please explain why.
Consider the following questions when you provide your explanation. Why is the analysis based on the Table 3 TNM Staging incorrect? Why is it necessary to stratify
by TNM Stage when using the Staging Criteria in Table 4? And why are the results from (2) different from (9)? Do you think that improvements in treatment over time
for Lung Cancer, have results in improved 6-month survival overall?

______________________________________________________________________________________

_______________________________________________________________________________________

_______________________________________________________________________________________

_______________________________________________________________________________________

______________________________________________________________________________________

_______________________________________________________________________________________

_______________________________________________________________________________________

_______________________________________________________________________________________