## Aligning Ontario’s Scheme for Identifying Census Divisions with Canada’s

Ontario’s Ministry of Finance regularly updates its population projections for the province; its most recent updates were published in the Spring 2016. These population projections are organized into 4 different datasets:

• projections for the whole province
• projections for each census division
• projections for each Local Health Integration Network (LHIN)
• projections for each Ministry of Children and Youth Services’ Service Delivery Division (SDD) region

Unfortunately (even inexplicably), Ontario uses a different scheme for identifying Census Divisions from Canada’s. We may use this map:

and this map:

allow us to generate the following table of alignments:

Table 4. Aligning Ontario’s scheme for identifying Census Divisions with Canada’s.
CD_ID
(Ontario)
CD_ID
CD_ID
(Ontario)
CD_ID
1 20 26 13
2 18 27 47
3 24 28 1
4 21 29 41
5 19 30 34
6 29 31 37
7 22 32 42
8 28 33 40
9 46 34 36
10 25 35 38
11 44 36 39
12 26 37 32
13 14 38 31
14 15 39 57
15 43 40 56
16 16 41 51
17 30 42 48
18 23 43 49
19 6 44 53
20 10 45 52
21 12 46 54
22 9 47 60
23 7 48 59
24 11 49 58
25 2

We will need to make use of this Table of alignments when we come to map the Ministry of Finance’s population projections onto the boundaries of Ontario’s Local Health Integration Networks (LHINs).

## … and then there were 33

In a series of previous posts, I have been exploring the use of D3 (Data Driven Documents) – a Free and Open Source Software package – to visualize geo-spatial data associated with the Children and Youth Mental Health (CYMH) Service Areas that have been established recently by the Ministry of Children and Youth Services (MCYS) in Ontario. To avoid confusion, I wanted to alert users of some of the resources that I have published of an important development.

The MCYS has been resourcing the administration and functions of the CYMH Service Areas, including the designation of Lead Agencies, over the past few years. During this time, there has been some uncertainty about whether there were to be thirty-three or thirty-four CYMH Service Areas – turning on whether the James Bay Coast would be its own Service Area or would be merged with the Timiskaming/Cochrane Service Area.

When I began to explore the use of d3 to visualize the CYMH Service Areas, the geo-spatial data published by the Ontario government mapped thirty-four Service Areas (e.g. see the Wayback Machine archive of September 6, 2015!) and the resources that I have published reflected this configuration. Now the geo-spatial data published by the government maps only thirty-three CYMH Service Areas.

I’ve completed the revision of resources for users in the past few days – a slight inconvenience for us all. There is a bigger issue, though:

The Ontario government is providing an incredibly valuable resource when it publishes the geo-spatial data associated with the administration of public services, like children and youth mental health services. I would only urge that the government’s web pages that describe and make geo-spatial resources available to us should retain and present the different versions of these resources over time. This approach would not only avoid possible confusion as revisions are made, but the differences between the versions may themselves be of interest to the public.

# Policy on standards (revised July 14, 2004)

## Introduction

Statistics Canada aims to ensure that the information it produces provides a consistent and coherent picture of the Canadian economy, society and environment, and that its various datasets can be analyzed together and in combination with information from other sources.

To this end, the Agency pursues three strategic goals:

1. The use of conceptual frameworks, such as the System of National Accounts, that provide a basis for consolidating statistical information about certain sectors or dimensions of the Canadian scene;
2. The use of standard names and definitions for populations, statistical units, concepts, variables and classifications in statistical programs;
3. The use of consistent collection and processing methods for the production of statistical data across surveys.

This Policy deals with the second of these strategic goals. It provides a framework for reviewing, documenting, authorizing, and monitoring the use of standard names and definitions for populations, statistical units, concepts, variables and classifications used in Statistics Canada’s programs. Standards for specific subject-matter areas will be issued from time to time under this Policy as required.

## Policy

Statistics Canada aims to use consistent names and definitions for populations, statistical units, concepts, variables, and classifications used in its statistical programs. To this end:

1. Statistical products will be accompanied by, or make explicit reference to, readily accessible documentation on the definitions of populations, statistical units, concepts, variables and classifications used.
2. Wherever inconsistencies or ambiguities in name or definition are recognized between related statistical units, concepts, variables or classifications, within or across programs, the Agency will work towards the development of a standard for the statistical units, concepts, variables and classifications that harmonize such differences.
3. Standards and guidelines covering particular subject-matter areas will be issued from time to time and their use will be governed by the provisions of this Policy.
4. Where departmental standards have been issued, program areas must follow them unless a specific exemption has been obtained under the provisions of this Policy.
5. Programs should, to the extent possible, collect and retain information at the fundamental or most detailed level of each standard classification in order to provide maximum flexibility in aggregation and facilitate retrospective reclassification as needs change.
6. When a program uses a population, statistical unit, concept, variable or classification not covered by a departmental standard, or uses a variation of a standard approved as an exemption, it shall use a unique name for the entity to distinguish it from any previously defined standard.
7. Clients of Statistics Canada’s consultative services should be made aware of and encouraged to conform to the standards and guidelines issued under this Policy.
8. The Agency will build up a database of names and definitions used in its programs and make this database accessible to users and other players in the statistical system.

## Scope

This policy applies to disseminated data however collected, derived or assembled, and irrespective of the medium of dissemination or the source of funding. This policy may also be applied to data at the stage of collection and processing at Statistics Canada.

## Guidelines for the development and documentation of standards

### A. Introduction

These guidelines describe the requirements and give guidance for the development and documentation of standard names and definitions of populations, statistical units, concepts, variables and classifications. Section B defines the terminology; guidelines follow in Section C.

### B. Terminology

For purposes of these guidelines the following terms are used.

Population: The set of statistical units to which a dataset refers.

Concept: A general or abstract idea that expresses the social and/or economic phenomenon to be measured.

Statistical unit: The unit of observation or measurement for which data are collected or derived. The following list provides examples of standard statistical units that have been defined.

Person
Census family
Economic family
Household
Dwelling
Location
Establishment
Company
Enterprise

Variable: A variable consists of two components, a statistical unit and a property. A property is a characteristic or attribute of the statistical unit.

Classification: A classification is a systematic grouping of the values that a variable can take comprising mutually exclusive classes, covering the full set of values, and often providing a hierarchical structure for aggregating data. More than one classification can be used to represent data for a given variable.

Example: The following is an example of the variable: Age of Person.

Concept:  Based on the subjects used by Statistics Canada to organize its statistical products and metadata, the variable Age of Person is listed under the concept of Population and Demography.

Statistical unit and property: The statistical unit and property that define this variable are Person and Age respectively. Person refers to an individual – this is the unit of analysis for most social statistics programmes. Age refers to the age of a person (or subject) of interest at last birthday (or relative to a specified, well-defined reference date).

Classification:  Different classifications can be used to represent data for this variable. These classifications include: Age Categories, Five-year Age Groups; and Age Categories, Life Cycle Groupings.

The standard names and definitions of populations, statistical units, concepts, variables and classifications will be stored in the Integrated Metadatabase (IMDB). In the case of variables, the name stored in the IMDB will include a representation type, in addition to the statistical unit and property. In the age example given here, the full name of the variable in the IMDB would be Category of Age of Person. The representation type Category indicates that it is a categorical variable, which will be represented by a classification of age groups.

### C. Guidelines

Each standard should have the following characteristics:

• describe the concept that the standard addresses when appropriate;
• identify the statistical unit(s) to which it applies;
• provide a name and definition of each variable included in the standard;
• provide the classification(s) to be used in the compilation and dissemination of data on each variable.

The most detailed level of a classification will always be included in a standard. Recommended and optional aggregation structures may also be present.

Concepts shall be described in relation to a framework when possible.

Every variable shall be given a name, in both official languages, which, once given, cannot be used to denote any other variable. Variables shall be defined with explanatory notes in terms of a property and the statistical unit to which it applies. Additionally, in the IMDB, the representation type will be defined.

Every classification shall be given a name, in both official languages, which, once given, cannot be used to denote any other classification. Classifications shall be defined, with exclusions listed and explanatory notes given, where required.

Every class shall be given a name, in both official languages, which, once given, cannot be used to denote any other grouping for the referenced variable within a given “family” of classifications (i.e. a given classification and all its variants). Classes shall be defined, with exclusions listed and explanatory notes given, where required.

The most frequently used populations shall be given a name, in both official languages, which, once given, cannot be used to denote any other population. These populations shall be defined with explanatory notes.

Every statistical unit shall be given a name, in both official languages, which, once given, cannot be used to denote any other statistical unit. Statistical units shall be defined with explanatory notes.

A standard shall be accompanied by a statement of conformity to relevant internationally recognized standards, or a description of the deviations from such a standard and, where possible, a concordance with the referenced standard.

Where a standard replaces an earlier one, a concordance between the old and the new shall be given.

A standard shall include a statement regarding the degree to which its application is compulsory. The different degrees are, in descending order of compulsion:

• departmental standard: a standard that has been approved by the Policy Committee, and the application of which is therefore compulsory, unless an exemption has been explicitly obtained under the terms of this policy;
• recommended standard: a standard that has been recognized by the Methods and Standards Committee as a recommended standard, with or without a trial period of a specified duration, after which it may be declared as a departmental standard;
• program-specific standard: a standard adopted by a statistical program, and which is registered with Standards Division, to ensure consistency in a series over time periods.

# Age of person

## Status

Age of person was approved as a departmental standard on May 22, 2007.

## Definition

Age refers to the age of a person (or subject) of interest at last birthday (or relative to a specified, well-defined reference date).

Person refers to an individual and is the unit of analysis for most social statistics programmes.

## Derivation

Age of person is usually derived. It is usually calculated using the person’s date of birth and the date of interview or other well-defined reference date.

## Relation to previous standard

A classification by single years of age has been added. In the classification of five year age groups, the top five categories in the previous classification have been collapsed into one category. These top categories were collapsed to reflect the population numbers in these categories and the reliability of the data in this part of the age range. The classification Age by life cycle groupings, which was part of the previous standard, is no longer recognized as part of the standard for age.

## Conformity to relevant internationally recognized standards

This standard conforms to the recommendations for censuses contained in the United Nations’ Principles and Recommendations for Population and Housing Censuses, Revision 2, 2008. The UN recommendations define age as “the interval of time between the date of birth and the date of the census, expressed in completed solar years”. This is equivalent to this standard’s definition of age as “age at last birthday”. In addition, the UN recommends calculating age from date of birth rather than asking it directly. This derivation of age is recognized in this standard as the usual practice. Use of date of birth, as noted in the UN Principles, allows age to be calculated precisely, avoiding rounding by respondents and potential misunderstanding as to whether the age wanted is that of the last birthday, the next birthday or the nearest birthday. Finally, in the suggested census output tables, the UN Principles use five-year age groupings with the same boundaries as those presented in this standard. The only differences from this standard are that the upper category has a lower boundary (typically, “85 and over”) and that sometimes children under age 1 year are reported in a separate category.

The Conference of European Statisticians Recommendations for the 2010 Censuses of Population and Housing also recommends that information on age be obtained by collecting information on date of birth.

# Age Categories, Five-Year Age Groups

This classification was replaced by a new departmental standard on May 22, 2007.
 ID Age Range 10 0-4 years 11 5-9 years 12 10-14 years 13 15-19 years 14 20-24 years 15 25-29 years 16 30-34 years 17 35-39 years 18 40-44 years 19 45-49 years 20 50-54 years 21 55-59 years 22 60-64 years 23 65-69 years 24 70-74 years 25 75-79 years 26 80-84 years 27 85-89 years 28 90-94 years 29 95-99 years 30 100-104 years 31 105-109 years 32 110-114 years 33 115-119 years 34 120-124 years

# Age Categories, Life Cycle Groupings

 1 Children (00-14 years) 11 00-04 years 110 00-04 years 12 05-09 years 120 05-09 years 13 10-14 years 130 10-14 years 2 Youth (15-24 years) 21 15-19 years 211 15-17 years 212 18-19 years 22 20-24 years 221 20-21 years 222 22-24 years 3 Adults (25-64 years) 31 25-29 years 310 25-29 years 32 30-34 years 320 30-34 years 33 35-39 years 330 35-39 years 34 40-44 years 340 40-44 years 35 45-49 years 350 45-49 years 36 50-54 years 360 50-54 years 37 55-59 years 370 55-59 years 38 60-64 years 380 60-64 years 4 Seniors (65 years and over) 41 65-69 years 410 65-69 years 42 70-74 years 420 70-74 years 43 75-79 years 430 75-79 years 44 80-84 years 440 80-84 years 45 85-89 years 450 85-89 years 46 90 years and over 460 90 years and over

## Agresti, A – Datasets

This site contains data sets that are not shown completely in text examples and exercises. (The numbering refers to the 3rd edition, 2013)

1. Horseshoe crab data set of Table 4.3

(Here y is whether a female crab has a satellite (1=yes, 0=no) and weight is in grams, rather than kg as in the text. Also, color has values 1-5 with 1=light; there were no crabs of color 1, so in the text, color was re-coded as color – 1 to give values 1, 2, 3, 4.)


color spine width satell weight y
3  3  28.3  8  3050 1
4  3  22.5  0  1550 0
2  1  26.0  9  2300 1
4  3  24.8  0  2100 0
4  3  26.0  4  2600 1
3  3  23.8  0  2100 0
2  1  26.5  0  2350 0
4  2  24.7  0  1900 0
3  1  23.7  0  1950 0
4  3  25.6  0  2150 0
4  3  24.3  0  2150 0
3  3  25.8  0  2650 0
3  3  28.2  11 3050 1
5  2  21.0  0  1850 0
3  1  26.0  14 2300 1
2  1  27.1  8  2950 1
3  3  25.2  1  2000 1
3  3  29.0  1  3000 1
5  3  24.7  0  2200 0
3  3  27.4  5  2700 1
3  2  23.2  4  1950 1
2  2  25.0  3  2300 1
3  1  22.5  1  1600 1
4  3  26.7  2  2600 1
5  3  25.8  3  2000 1
5  3  26.2  0  1300 0
3  3  28.7  3  3150 1
3  1  26.8  5  2700 1
5  3  27.5  0  2600 0
3  3  24.9  0  2100 0
2  1  29.3  4  3200 1
2  3  25.8  0  2600 0
3  2  25.7  0  2000 0
3  1  25.7  8  2000 1
3  1  26.7  5  2700 1
5  3  23.7  0  1850 0
3  3  26.8  0  2650 0
3  3  27.5  6  3150 1
5  3  23.4  0  1900 0
3  3  27.9  6  2800 1
4  3  27.5  3  3100 1
2  1  26.1  5  2800 1
2  1  27.7  6  2500 1
3  1  30.0  5  3300 1
4  1  28.5  9  3250 1
4  3  28.9  4  2800 1
3  3  28.2  6  2600 1
3  3  25.0  4  2100 1
3  3  28.5  3  3000 1
3  1  30.3  3  3600 1
5  3  24.7  5  2100 1
3  3  27.7  5  2900 1
2  1  27.4  6  2700 1
3  3  22.9  4  1600 1
3  1  25.7  5  2000 1
3  3  28.3  15 3000 1
3  3  27.2  3  2700 1
4  3  26.2  3  2300 1
3  1  27.8  0  2750 0
5  3  25.5  0  2250 0
4  3  27.1  0  2550 0
4  3  24.5  5  2050 1
4  1  27.0  3  2450 1
3  3  26.0  5  2150 1
3  3  28.0  1  2800 1
3  3  30.0  8  3050 1
3  3  29.0  10 3200 1
3  3  26.2  0  2400 0
3  1  26.5  0  1300 0
3  3  26.2  3  2400 1
4  3  25.6  7  2800 1
4  3  23.0  1  1650 1
4  3  23.0  0  1800 0
3  3  25.4  6  2250 1
4  3  24.2  0  1900 0
3  2  22.9  0  1600 0
4  2  26.0  3  2200 1
3  3  25.4  4  2250 1
4  3  25.7  0  1200 0
3  3  25.1  5  2100 1
4  2  24.5  0  2250 0
5  3  27.5  0  2900 0
4  3  23.1  0  1650 0
4  1  25.9  4  2550 1
3  3  25.8  0  2300 0
5  3  27.0  3  2250 1
3  3  28.5  0  3050 0
5  1  25.5  0  2750 0
5  3  23.5  0  1900 0
3  2  24.0  0  1700 0
3  1  29.7  5  3850 1
3  1  26.8  0  2550 0
5  3  26.7  0  2450 0
3  1  28.7  0  3200 0
4  3  23.1  0  1550 0
3  1  29.0  1  2800 1
4  3  25.5  0  2250 0
4  3  26.5  1  1967 1
4  3  24.5  1  2200 1
4  3  28.5  1  3000 1
3  3  28.2  1  2867 1
3  3  24.5  1  1600 1
3  3  27.5  1  2550 1
3  2  24.7  4  2550 1
3  1  25.2  1  2000 1
4  3  27.3  1  2900 1
3  3  26.3  1  2400 1
3  3  29.0  1  3100 1
3  3  25.3  2  1900 1
3  3  26.5  4  2300 1
3  3  27.8  3  3250 1
3  3  27.0  6  2500 1
4  3  25.7  0  2100 0
3  3  25.0  2  2100 1
3  3  31.9  2  3325 1
5  3  23.7  0  1800 0
5  3  29.3  12 3225 1
4  3  22.0  0  1400 0
3  3  25.0  5  2400 1
4  3  27.0  6  2500 1
4  3  23.8  6  1800 1
2  1  30.2  2  3275 1
4  3  26.2  0  2225 0
3  3  24.2  2  1650 1
3  3  27.4  3  2900 1
3  2  25.4  0  2300 0
4  3  28.4  3  3200 1
5  3  22.5  4  1475 1
3  3  26.2  2  2025 1
3  1  24.9  6  2300 1
2  2  24.5  6  1950 1
3  3  25.1  0  1800 0
3  1  28.0  4  2900 1
5  3  25.8  10 2250 1
3  3  27.9  7  3050 1
3  3  24.9  0  2200 0
3  1  28.4  5  3100 1
4  3  27.2  5  2400 1
3  2  25.0  6  2250 1
3  3  27.5  6  2625 1
3  1  33.5  7  5200 1
3  3  30.5  3  3325 1
4  3  29.0  3  2925 1
3  1  24.3  0  2000 0
3  3  25.8  0  2400 0
5  3  25.0  8  2100 1
3  1  31.7  4  3725 1
3  3  29.5  4  3025 1
4  3  24.0  10 1900 1
3  3  30.0  9  3000 1
3  3  27.6  4  2850 1
3  3  26.2  0  2300 0
3  1  23.1  0  2000 0
3  1  22.9  0  1600 0
5  3  24.5  0  1900 0
3  3  24.7  4  1950 1
3  3  28.3  0  3200 0
3  3  23.9  2  1850 1
4  3  23.8  0  1800 0
4  2  29.8  4  3500 1
3  3  26.5  4  2350 1
3  3  26.0  3  2275 1
3  3  28.2  8  3050 1
5  3  25.7  0  2150 0
3  3  26.5  7  2750 1
3  3  25.8  0  2200 0
4  3  24.1  0  1800 0
4  3  26.2  2  2175 1
4  3  26.1  3  2750 1
4  3  29.0  4  3275 1
2  1  28.0  0  2625 0
5  3  27.0  0  2625 0
3  2  24.5  0  2000 0


2. Teratology study data set of Table 4.7


litter group n y
1  1 10 1
2 1 11 4
3 1 12 9
4 1 4 4
5 1 10 10
6 1 11 9
7  1 9  9
8 1 11 11
9 1 10 10
10 1 10 7
11 1 12 12
12 1 10 9
13 1 8  8
14 1 11  9
15 1 6  4
16 1  9 7
17 1 14 14
18 1 12 7
19 1 11 9
20 1 13 8
21 1 14 5
22 1 10 10
23 1 12 10
24 1 13 8
25 1 10 10
26 1 14 3
27 1 13 13
28 1 4 3
29 1  8  8
30 1 13 5
31 1 12 12
32 2 10 1
33 2  3  1
34 2 13 1
35 2 12  0
36 2 14 4
37 2  9  2
38 2 13 2
39 2 16  1
40 2 11 0
41 2  4  0
42 2 1  0
43 2 12 0
44 3  8 0
45 3 11  1
46 3 14 0
47 3 14 1
48 3 11 0
49 4  3 0
50 4 13 0
51 4 9   2
52 4 17 2
53 4 15 0
54 4 2 0
55 4 14 1
56 4 8  0
57 4 6  0
58 4 17 0


3. Ray Allen data set for Exercise 4.13


1   0  4
2   7  9
3   4 11
4   3  6
5   5  6
6   2  7
7   3  7
8   0  1
9   1  8
10  6  9
11  0  5
12  2  5
13  0  5
14  2  4
15  5  7
16  1  3
17  3  7
18  0  2
19  8 11
20  0  8
21  0  4
22  0  4
23  2  5
24  2  7


4. Rajon Rondo assists data set for Exercise 5.3


assists result * 1=win, last 9 observations are playoffs
17 1
9 0
24 1
17 1
15 1
11 1
10 1
15 0
16 1
17 1
13 1
7 0
14 1
12 1
10 1
19 1
13 1
14 1
8 1
14 1
8 1
16 1
23 1
7 1
8 0
12 0
13 1
13 1
12 1
8 1
12 1
9 0
10 1
5 1
6 0
16 1
10 1
12 0
7 1
14 0
10 0
10 1
8 1
15 1
8 0
11 1
11 1
15 1
16 1
8 1
9 0
5 0
3 1
9 0
8 1
6 0
5 1
12 1
11 0
5 0
8 0
14 1
5 0
14 1
13 1
6 0
14 1
5 0

9 1
7 1
20 1
12 1
7 0
12 0
11 1
5 0
3 0
;


5. Data on Italian credit cards, for Exercise 5.22


income n y
24  1  0
34  7  1
48  1  0
70  5  3
27  1  0
35  1  1
49  1  0
79  1  0
28  5  2
38  3  1
50  10  2
80  1  0
29  3  0
39  2  0
52  1  0
84  1  0
30  9  1
40  5  0
59  1  0
94  1  0
31  5  1
41  2  0
60  5  2
120  6  6
32  8  0
42  2  0
65  6  6
130  1  1
33  1  0
45  1  1
68  3  3


6. Full data set for Table 6.2 on endometrial cancer grade


nv pi eh hg * standardized use nv2=(nv-0.5); pi2=(pi-17.3797)/9.9978; eh2=(eh-1.6616)/.6621;
datalines;
0 13 1.64 0
0 16 2.26 0
0  8 3.14 0
0 34 2.68 0
0 20 1.28 0
0  5 2.31 0
0 17 1.80 0
0 10 1.68 0
0 26 1.56 0
0 17 2.31 0
0  8 2.01 0
0  7 1.89 0
0 20 3.15 0
0 10 1.23 0
0 18 1.27 0
0 16 1.76 0
0 18 2.00 0
0  8 2.64 1
0 29 0.88 1
0 12 1.27 1
0 20 1.37 1
1 38 0.97 1
1 22 1.14 1
1  7 0.88 1
1 25 0.91 1
1 15 0.58 1
0  7 0.97 1
0 28 1.50 0
0 11 1.33 0
0 19 2.37 0
0 10 1.82 0
0 10 3.13 0
0 18 1.31 0
0 14 1.92 0
0 21 1.64 0
0 11 2.01 0
0 17 1.88 0
0 25 1.93 0
0 16 2.11 0
0 19 1.29 0
0 15 1.72 0
0 33 0.75 0
0 24 1.92 0
0 48 1.84 1
0 12 1.11 1
0 19 1.61 1
0  2 1.18 1
1 22 1.44 1
1 40 1.18 1
1  5 0.93 1
1  0 1.17 1
0 21 1.19 1
0 15 1.06 1
0 29 2.02 0
0 15 2.29 0
0 12 2.33 0
0  3 2.90 0
0 20 1.70 0
0 23 1.41 0
0 12 2.25 0
0 22 1.54 0
0 42 1.97 0
0 15 1.75 0
0 13 2.16 0
0 14 2.57 0
0 19 1.37 0
0 12 3.61 0
0 13 2.04 0
0 10 2.17 0
0 12 1.69 1
1 49 0.27 1
0  6 1.84 1
0  5 1.30 1
0 17 0.96 1
1 11 1.01 1
1 21 0.98 1
0  5 0.35 1
1 19 1.02 1
0 33 0.85 1


7. Clinical trials data set of Table 6.9


center treat response count
a 1 1 11
a 1 2 25
a 2 1 10
a 2 2 27
b 1 1 16
b 1 2 4
b 2 1 22
b 2 2 10
c 1 1 14
c 1 2 5
c 2 1 7
c 2 2 12
d 1 1 2
d 1 2 14
d 2 1 1
d 2 2 16
e 1 1 6
e 1 2 11
e 2 1 0
e 2 2 12
f 1 1 1
f 1 2 10
f 2 1 0
f 2 2 10
g 1 1 1
g 1 2 4
g 2 1 1
g 2 2 8
h 1 1 4
h 1 2 2
h 2 1 6
h 2 2 1


8. Data set for Exercises 6.3 and 9.13


Premarital Sex
1                     2
Religious Attendence       1        2           1          2
Birth control      1     2   1    2      1    2     1    2
1     99   15   73  25       8   4     24  22
Political            2     73   20   87  37      20  13     50  60
Views                3     51   19   51  36       6  12     33  88


9. Data set for Exercise 6.7


<35     35-44     >44        <35    35-44    >44

Region           M    F    M   F    M    F     M   F   M   F   M   F

Northeast
Satisfied      288   60  224  35  337   70    38  19  32  22  21  15
Not satisfied  177   57  166  19  172   30    33  35  11  20   8  10

Mid-Atlantic
Satisfied       90   19   96  12  124   17    18  13   7   0   9   1
Not satisfied   45   12   42   5   39    2     6   7   2   3   2   1

Southern
Satisfied      226   88  189  44  156   70    45  47  18  13  11   9
Not satisfied  128   57  117  34   73   25    31  35   3   7   2   2

Midwest
Satisfied      285  110  225  53  324   60    40  66  19  25  22  11
Not satisfied  179   93  141  24  140   47    25  56  11  19   2  12

Northwest
Satisfied      270  176  215  80  269  110    36  25   9  11  16   4
Not satisfied  180  151  108  40  136   40    20  16   7   5   3   5

Southwest
Satisfied      252   97  162  47  199   62    69  45  14   8  14   2
Not satisfied  126   61   72  27   93   24    27  36   7   4   5   0

Pacific
Satisfied      119   62   66  20   67   25    45  22  15  10   8   6
Not satisfied   58   33   20  10   21   10    16  15  10   8   6   2


10. Data on surgery and sore throats in Table 6.15, for Exercise 6.8


D  T  Y
45 0 0
15 0 0
40 0 1
83 1 1
90 1 1
25 1 1
35 0 1
65 0 1
95 0 1
35 0 1
75 0 1
45 1 1
50 1 0
75 1 1
30 0 0
25 0 1
20 1 0
60 1 1
70 1 1
30 0 1
60 0 1
61 0 0
65 0 1
15 1 0
20 1 0
45 0 1
15 1 0
25 0 1
15 1 0
30 0 1
40 0 1
15 1 0
135 1 1
20 1 0
40 1 0


11. Data on incontinence study, for Exercise 6.20


y x1 x2 x3
0  -1.9  -5.3  -43
0  -0.1  -5.2  -32
0  ~0.8  -3.0  -12
0  ~0.9   3.4   ~1
1  -5.6 -13.1   -1
1  -2.4   1.8   -9
1  -2.0  -5.7   -7
1  -0.6  -2.4   -7
1  -0.1 -10.2   -5
1  ~0.4 -17.2   -9
1  ~1.1  -4.5  -15
0  -1.5   3.9  -15
0   0.5  27.5    8
0   0.8  -1.6   -2
0   2.3  23.4   14
1  -5.3 -19.8  -33
1  -2.3  -7.4    4
1  -1.7  -3.9   13
1  -0.5 -14.5  -12
1  -0.1  -9.9  -11
1   0.7 -10.7  -10


12. Data set for Exercise 6.28


Occupational aspirations
Socioeconomic
Gender  Residence   IQ     status
High  Low

Male     Rural     High      High              117  47
Low               54  87
Low      High               29  78
Low               31  262
Small     High      High              350  80
urban              Low               70  85
Low      High               71  120
Low               33  265
Large     High      High              151   31
urban              Low               27   23
Low      High               30   27
Low               12   52

Female   Rural     High      High              102   69
Low               52  119
Low      High               32   73
Low               28  349
Small     High      High              338   96
urban              Low               44   99
Low      High               76  107
Low               22  344
Large     High      High              148   35
urban              Low               17   39
Low      High               21   47
Low                6  116


13. Alligator food choice data set of Table 8.1


lake gender size food count
1 1 1 1 7
1 1 1 2 1
1 1 1 3 0
1 1 1 4 0
1 1 1 5 5
1 1 2 1 4
1 1 2 2 0
1 1 2 3 0
1 1 2 4 1
1 1 2 5 2
1 2 1 1 16
1 2 1 2 3
1 2 1 3 2
1 2 1 4 2
1 2 1 5 3
1 2 2 1 3
1 2 2 2 0
1 2 2 3 1
1 2 2 4 2
1 2 2 5 3
2 1 1 1 2
2 1 1 2 2
2 1 1 3 0
2 1 1 4 0
2 1 1 5 1
2 1 2 1 13
2 1 2 2 7
2 1 2 3 6
2 1 2 4 0
2 1 2 5 0
2 2 1 1 3
2 2 1 2 9
2 2 1 3 1
2 2 1 4 0
2 2 1 5 2
2 2 2 1 0
2 2 2 2 1
2 2 2 3 0
2 2 2 4 1
2 2 2 5 0
3 1 1 1 3
3 1 1 2 7
3 1 1 3 1
3 1 1 4 0
3 1 1 5 1
3 1 2 1 8
3 1 2 2 6
3 1 2 3 6
3 1 2 4 3
3 1 2 5 5
3 2 1 1 2
3 2 1 2 4
3 2 1 3 1
3 2 1 4 1
3 2 1 5 4
3 2 2 1 0
3 2 2 2 1
3 2 2 3 0
3 2 2 4 0
3 2 2 5 0
4 1 1 1 13
4 1 1 2 10
4 1 1 3 0
4 1 1 4 2
4 1 1 5 2
4 1 2 1 9
4 1 2 2 0
4 1 2 3 0
4 1 2 4 1
4 1 2 5 2
4 2 1 1 3
4 2 1 2 9
4 2 1 3 1
4 2 1 4 0
4 2 1 5 1
4 2 2 1 8
4 2 2 2 1
4 2 2 3 0
4 2 2 4 0
4 2 2 5 1


14. Full data set for Table 8.5 on happiness, traumatic events, and race


race trauma happy * race is 0=white, 1=black
0 0 1
0 0 1
0 0 1
0 0 1
0 0 1
0 0 1
0 0 1
0 0 2
0 0 2
0 0 2
0 0 2
0 0 2
0 0 2
0 0 2
0 0 2
0 0 2
0 0 2
0 0 2
0 0 2
0 0 2
0 0 2
0 0 2
0 0 3
0 1 1
0 1 1
0 1 1
0 1 1
0 1 1
0 1 1
0 1 1
0 1 1
0 1 2
0 1 2
0 1 2
0 1 2
0 1 2
0 1 2
0 1 2
0 1 2
0 1 2
0 1 2
0 1 2
0 1 2
0 1 3
0 2 1
0 2 1
0 2 1
0 2 1
0 2 1
0 2 2
0 2 2
0 2 2
0 2 2
0 2 2
0 2 2
0 2 2
0 2 2
0 2 2
0 2 2
0 2 2
0 2 2
0 2 2
0 2 2
0 2 2
0 2 2
0 2 3
0 3 1
0 3 2
0 3 2
0 3 2
0 3 2
0 3 2
0 3 2
0 3 2
0 3 2
0 3 2
0 3 3
0 4 1
0 4 2
0 4 2
0 4 2
0 4 2
0 5 3
0 5 3
1 0 2
1 0 3
1 1 2
1 1 2
1 1 2
1 1 3
1 2 2
1 2 2
1 2 2
1 2 3
1 3 2
1 3 2
1 3 3


15. Data for Exercise 8.18 on dumping severity


Hospital 1     Hosptial 2     Hospital 3    Hospital 4
Operation   N   S   M      N   S   M      N   S   M     N   S  M
A        23    7   2    18    6   1     8   6   3    12   9   1
B        23   10   5    18    6   2    12   4   4    15   3   2
C        20   13   5    13   13   2    11   6   2    14   8   3
D        24   10   6     9   15   2     7   7   4    13   6   4


16. Data for Exercise 8.28 on satisfaction with housing


Housing                    Low contact: Satisfaction    High contact: Satisfaction
Low   Medium   High          Low   Medium   High
Influence

Tower blocks         Low     21      21       28           14     19     37
Medium     34      22       36           17     23     40
High     10      11       36            3      5     23

Apartments           Low     61      23       17           78     46     43
Medium     43      35       40           48     45     86
High     26      18       54           15     25     62

Atrium houses        Low     13       9       10           20     23     20
Medium      8       8       12           10     22     24
High      6       7        9            7     10     21

Terraced houses      Low     18       6        7           57     23     13
Medium     15      13       13           31     21     13
High      7       5       11            5      6     13


17. High school student survey data set of Table 9.3


a c m count
1 1 1 911
1 1 2 538
1 2 1 44
1 2 2 456
2 1 1   3
2 1 2  43
2 2 1  2
2 2 2 279


18. Government spending data set for Exercise 9.5


e h c l count
1 1 1 1 62
1 1 2 1 90
1 1 3 1 74
1 2 1 1 11
1 2 2 1 22
1 2 3 1 19
1 3 1 1 2
1 3 2 1 2
1 3 3 1 1
2 1 1 1 11
2 1 2 1 21
2 1 3 1 20
2 2 1 1 1
2 2 2 1 6
2 2 3 1 6
2 3 1 1 1
2 3 2 1 2
2 3 3 1 4
3 1 1 1 3
3 1 2 1 2
3 1 3 1 9
3 2 1 1 1
3 2 2 1 2
3 2 3 1 4
3 3 1 1 1
3 3 2 1 0
3 3 3 1 1
1 1 1 2 17
1 1 2 2 42
1 1 3 2 31
1 2 1 2 7
1 2 2 2 18
1 2 3 2 14
1 3 1 2 3
1 3 2 2 0
1 3 3 2 3
2 1 1 2 3
2 1 2 2 13
2 1 3 2 8
2 2 1 2 4
2 2 2 2 9
2 2 3 2 5
2 3 1 2 0
2 3 2 2 1
2 3 3 2 3
3 1 1 2 0
3 1 2 2 1
3 1 3 2 2
3 2 1 2 0
3 2 2 2 1
3 2 3 2 2
3 3 1 2 0
3 3 2 2 0
3 3 3 2 2
1 1 1 3 5
1 1 2 3 3
1 1 3 3 11
1 2 1 3 0
1 2 2 3 1
1 2 3 3 3
1 3 1 3 1
1 3 2 3 1
1 3 3 3 1
2 1 1 3 0
2 1 2 3 2
2 1 3 3 3
2 2 1 3 0
2 2 2 3 0
2 2 3 3 2
2 3 1 3 1
2 3 2 3 1
2 3 3 3 1
3 1 1 3 0
3 1 2 3 0
3 1 3 3 1
3 2 1 3 0
3 2 2 3 0
3 2 3 3 0
3 3 1 3 0
3 3 2 3 0
3 3 3 3 3


19. Alcohol, cigarette, and marijuana use, by gender and race, as in Table 10.1


a c m r g count
1 1 1 1 1 405
1 1 1 2 1  23
1 2 1 1 1  13
1 2 1 2 1   2
2 1 1 1 1   1
2 1 1 2 1   0
2 2 1 1 1   1
2 2 1 2 1   0
1 1 2 1 1 268
1 1 2 2 1  23
1 2 2 1 1 218
1 2 2 2 1  19
2 1 2 1 1  17
2 1 2 2 1   1
2 2 2 1 1 117
2 2 2 2 1  12
1 1 1 1 2 453
1 1 1 2 2  30
1 2 1 1 2  28
1 2 1 2 2   1
2 1 1 1 2   1
2 1 1 2 2   1
2 2 1 1 2   1
2 2 1 2 2   0
1 1 2 1 2 228
1 1 2 2 2  19
1 2 2 1 2 201
1 2 2 2 2  18
2 1 2 1 2  17
2 1 2 2 2   8
2 2 2 1 2 133
2 2 2 2 2  17


20. Opinions about birth control and premarital sex data set of Table 10.3


premar birth count
1 4  38
1 3  60
1 2  68
1 1  81
2 4  14
2 3  29
2 2  26
2 1  24
3 4  42
3 3  74
3 2  41
3 1  18
4 4 157
4 3 161
4 2  57
4 1  36


21. Migration data set of Table 11.5


row column count
ne ne 266
ne mw  15
ne  s  61
ne  w  28
mw ne  10
mw mw 414
mw  s  50
mw  w  40
s ne   8
s mw  22
s  s 578
s  w  22
w ne   7
w mw   6
w  s  27
w  w 301


22. Depression data set of Table 12.1 in case form


case severity treat time outcome * outcome=1 is normal
1  0  0  0  1
1  0  0  1  1
1  0  0  2  1
2  0  0  0  1
2  0  0  1  1
2  0  0  2  1
3  0  0  0  1
3  0  0  1  1
3  0  0  2  1
4  0  0  0  1
4  0  0  1  1
4  0  0  2  1
5  0  0  0  1
5  0  0  1  1
5  0  0  2  1
6  0  0  0  1
6  0  0  1  1
6  0  0  2  1
7  0  0  0  1
7  0  0  1  1
7  0  0  2  1
8  0  0  0  1
8  0  0  1  1
8  0  0  2  1
9  0  0  0  1
9  0  0  1  1
9  0  0  2  1
10  0  0  0  1
10  0  0  1  1
10  0  0  2  1
11  0  0  0  1
11  0  0  1  1
11  0  0  2  1
12  0  0  0  1
12  0  0  1  1
12  0  0  2  1
13  0  0  0  1
13  0  0  1  1
13  0  0  2  1
14  0  0  0  1
14  0  0  1  1
14  0  0  2  1
15  0  0  0  1
15  0  0  1  1
15  0  0  2  1
16  0  0  0  1
16  0  0  1  1
16  0  0  2  1
17  0  0  0  1
17  0  0  1  1
17  0  0  2  0
18  0  0  0  1
18  0  0  1  1
18  0  0  2  0
19  0  0  0  1
19  0  0  1  1
19  0  0  2  0
20  0  0  0  1
20  0  0  1  1
20  0  0  2  0
21  0  0  0  1
21  0  0  1  1
21  0  0  2  0
22  0  0  0  1
22  0  0  1  1
22  0  0  2  0
23  0  0  0  1
23  0  0  1  1
23  0  0  2  0
24  0  0  0  1
24  0  0  1  1
24  0  0  2  0
25  0  0  0  1
25  0  0  1  1
25  0  0  2  0
26  0  0  0  1
26  0  0  1  1
26  0  0  2  0
27  0  0  0  1
27  0  0  1  1
27  0  0  2  0
28  0  0  0  1
28  0  0  1  1
28  0  0  2  0
29  0  0  0  1
29  0  0  1  1
29  0  0  2  0
30  0  0  0  1
30  0  0  1  0
30  0  0  2  1
31  0  0  0  1
31  0  0  1  0
31  0  0  2  1
32  0  0  0  1
32  0  0  1  0
32  0  0  2  1
33  0  0  0  1
33  0  0  1  0
33  0  0  2  1
34  0  0  0  1
34  0  0  1  0
34  0  0  2  1
35  0  0  0  1
35  0  0  1  0
35  0  0  2  1
36  0  0  0  1
36  0  0  1  0
36  0  0  2  1
37  0  0  0  1
37  0  0  1  0
37  0  0  2  1
38  0  0  0  1
38  0  0  1  0
38  0  0  2  1
39  0  0  0  1
39  0  0  1  0
39  0  0  2  0
40  0  0  0  1
40  0  0  1  0
40  0  0  2  0
41  0  0  0  1
41  0  0  1  0
41  0  0  2  0
42  0  0  0  0
42  0  0  1  1
42  0  0  2  1
43  0  0  0  0
43  0  0  1  1
43  0  0  2  1
44  0  0  0  0
44  0  0  1  1
44  0  0  2  1
45  0  0  0  0
45  0  0  1  1
45  0  0  2  1
46  0  0  0  0
46  0  0  1  1
46  0  0  2  1
47  0  0  0  0
47  0  0  1  1
47  0  0  2  1
48  0  0  0  0
48  0  0  1  1
48  0  0  2  1
49  0  0  0  0
49  0  0  1  1
49  0  0  2  1
50  0  0  0  0
50  0  0  1  1
50  0  0  2  1
51  0  0  0  0
51  0  0  1  1
51  0  0  2  1
52  0  0  0  0
52  0  0  1  1
52  0  0  2  1
53  0  0  0  0
53  0  0  1  1
53  0  0  2  1
54  0  0  0  0
54  0  0  1  1
54  0  0  2  1
55  0  0  0  0
55  0  0  1  1
55  0  0  2  1
56  0  0  0  0
56  0  0  1  1
56  0  0  2  0
57  0  0  0  0
57  0  0  1  1
57  0  0  2  0
58  0  0  0  0
58  0  0  1  1
58  0  0  2  0
59  0  0  0  0
59  0  0  1  1
59  0  0  2  0
60  0  0  0  0
60  0  0  1  0
60  0  0  2  1
61  0  0  0  0
61  0  0  1  0
61  0  0  2  1
62  0  0  0  0
62  0  0  1  0
62  0  0  2  1
63  0  0  0  0
63  0  0  1  0
63  0  0  2  1
64  0  0  0  0
64  0  0  1  0
64  0  0  2  1
65  0  0  0  0
65  0  0  1  0
65  0  0  2  1
66  0  0  0  0
66  0  0  1  0
66  0  0  2  1
67  0  0  0  0
67  0  0  1  0
67  0  0  2  1
68  0  0  0  0
68  0  0  1  0
68  0  0  2  1
69  0  0  0  0
69  0  0  1  0
69  0  0  2  1
70  0  0  0  0
70  0  0  1  0
70  0  0  2  1
71  0  0  0  0
71  0  0  1  0
71  0  0  2  1
72  0  0  0  0
72  0  0  1  0
72  0  0  2  1
73  0  0  0  0
73  0  0  1  0
73  0  0  2  1
74  0  0  0  0
74  0  0  1  0
74  0  0  2  1
75  0  0  0  0
75  0  0  1  0
75  0  0  2  0
336  0  0  0  0
336  0  0  1  0
336  0  0  2  0
337  0  0  0  0
337  0  0  1  0
337  0  0  2  0
338  0  0  0  0
338  0  0  1  0
338  0  0  2  0
339  0  0  0  0
339  0  0  1  0
339  0  0  2  0
340  0  0  0  0
340  0  0  1  0
340  0  0  2  0

76  0  1  0  1
76  0  1  1  1
76  0  1  2  1
77  0  1  0  1
77  0  1  1  1
77  0  1  2  1
78  0  1  0  1
78  0  1  1  1
78  0  1  2  1
79  0  1  0  1
79  0  1  1  1
79  0  1  2  1
80  0  1  0  1
80  0  1  1  1
80  0  1  2  1
81  0  1  0  1
81  0  1  1  1
81  0  1  2  1
82  0  1  0  1
82  0  1  1  1
82  0  1  2  1
83  0  1  0  1
83  0  1  1  1
83  0  1  2  1
84  0  1  0  1
84  0  1  1  1
84  0  1  2  1
85  0  1  0  1
85  0  1  1  1
85  0  1  2  1
86  0  1  0  1
86  0  1  1  1
86  0  1  2  1
87  0  1  0  1
87  0  1  1  1
87  0  1  2  1
88  0  1  0  1
88  0  1  1  1
88  0  1  2  1
89  0  1  0  1
89  0  1  1  1
89  0  1  2  1
90  0  1  0  1
90  0  1  1  1
90  0  1  2  1
91  0  1  0  1
91  0  1  1  1
91  0  1  2  1
92  0  1  0  1
92  0  1  1  1
92  0  1  2  1
93  0  1  0  1
93  0  1  1  1
93  0  1  2  1
94  0  1  0  1
94  0  1  1  1
94  0  1  2  1
95  0  1  0  1
95  0  1  1  1
95  0  1  2  1
96  0  1  0  1
96  0  1  1  1
96  0  1  2  1
97  0  1  0  1
97  0  1  1  1
97  0  1  2  1
98  0  1  0  1
98  0  1  1  1
98  0  1  2  1
99  0  1  0  1
99  0  1  1  1
99  0  1  2  1
100  0  1  0  1
100  0  1  1  1
100  0  1  2  1
101  0  1  0  1
101  0  1  1  1
101  0  1  2  1
102  0  1  0  1
102  0  1  1  1
102  0  1  2  1
103  0  1  0  1
103  0  1  1  1
103  0  1  2  1
104  0  1  0  1
104  0  1  1  1
104  0  1  2  1
105  0  1  0  1
105  0  1  1  1
105  0  1  2  1
106  0  1  0  1
106  0  1  1  1
106  0  1  2  1
107  0  1  0  1
107  0  1  1  0
107  0  1  2  1
108  0  1  0  1
108  0  1  1  0
108  0  1  2  1
109  0  1  0  1
109  0  1  1  0
109  0  1  2  1
110  0  1  0  1
110  0  1  1  0
110  0  1  2  1
111  0  1  0  1
111  0  1  1  0
111  0  1  2  1
112  0  1  0  1
112  0  1  1  0
112  0  1  2  1
113  0  1  0  0
113  0  1  1  1
113  0  1  2  1
114  0  1  0  0
114  0  1  1  1
114  0  1  2  1
115  0  1  0  0
115  0  1  1  1
115  0  1  2  1
116  0  1  0  0
116  0  1  1  1
116  0  1  2  1
117  0  1  0  0
117  0  1  1  1
117  0  1  2  1
118  0  1  0  0
118  0  1  1  1
118  0  1  2  1
119  0  1  0  0
119  0  1  1  1
119  0  1  2  1
120  0  1  0  0
120  0  1  1  1
120  0  1  2  1
121  0  1  0  0
121  0  1  1  1
121  0  1  2  1
122  0  1  0  0
122  0  1  1  1
122  0  1  2  1
123  0  1  0  0
123  0  1  1  1
123  0  1  2  1
124  0  1  0  0
124  0  1  1  1
124  0  1  2  1
125  0  1  0  0
125  0  1  1  1
125  0  1  2  1
126  0  1  0  0
126  0  1  1  1
126  0  1  2  1
127  0  1  0  0
127  0  1  1  1
127  0  1  2  1
128  0  1  0  0
128  0  1  1  1
128  0  1  2  1
129  0  1  0  0
129  0  1  1  1
129  0  1  2  1
130  0  1  0  0
130  0  1  1  1
130  0  1  2  1
131  0  1  0  0
131  0  1  1  1
131  0  1  2  1
132  0  1  0  0
132  0  1  1  1
132  0  1  2  1
133  0  1  0  0
133  0  1  1  1
133  0  1  2  1
134  0  1  0  0
134  0  1  1  1
134  0  1  2  1
135  0  1  0  0
135  0  1  1  1
135  0  1  2  0
136  0  1  0  0
136  0  1  1  1
136  0  1  2  0
137  0  1  0  0
137  0  1  1  0
137  0  1  2  1
138  0  1  0  0
138  0  1  1  0
138  0  1  2  1
139  0  1  0  0
139  0  1  1  0
139  0  1  2  1
140  0  1  0  0
140  0  1  1  0
140  0  1  2  1
141  0  1  0  0
141  0  1  1  0
141  0  1  2  1
142  0  1  0  0
142  0  1  1  0
142  0  1  2  1
143  0  1  0  0
143  0  1  1  0
143  0  1  2  1
144  0  1  0  0
144  0  1  1  0
144  0  1  2  1
145  0  1  0  0
145  0  1  1  0
145  0  1  2  1

146  1  0  0  1
146  1  0  1  1
146  1  0  2  1
147  1  0  0  1
147  1  0  1  1
147  1  0  2  1
148  1  0  0  1
148  1  0  1  1
148  1  0  2  0
149  1  0  0  1
149  1  0  1  1
149  1  0  2  0
150  1  0  0  1
150  1  0  1  0
150  1  0  2  1
151  1  0  0  1
151  1  0  1  0
151  1  0  2  1
152  1  0  0  1
152  1  0  1  0
152  1  0  2  1
153  1  0  0  1
153  1  0  1  0
153  1  0  2  1
154  1  0  0  1
154  1  0  1  0
154  1  0  2  1
155  1  0  0  1
155  1  0  1  0
155  1  0  2  1
156  1  0  0  1
156  1  0  1  0
156  1  0  2  1
157  1  0  0  1
157  1  0  1  0
157  1  0  2  1
158  1  0  0  1
158  1  0  1  0
158  1  0  2  0
159  1  0  0  1
159  1  0  1  0
159  1  0  2  0
160  1  0  0  1
160  1  0  1  0
160  1  0  2  0
161  1  0  0  1
161  1  0  1  0
161  1  0  2  0
162  1  0  0  1
162  1  0  1  0
162  1  0  2  0
163  1  0  0  1
163  1  0  1  0
163  1  0  2  0
164  1  0  0  1
164  1  0  1  0
164  1  0  2  0
165  1  0  0  1
165  1  0  1  0
165  1  0  2  0
166  1  0  0  1
166  1  0  1  0
166  1  0  2  0
167  1  0  0  0
167  1  0  1  1
167  1  0  2  1
168  1  0  0  0
168  1  0  1  1
168  1  0  2  1
169  1  0  0  0
169  1  0  1  1
169  1  0  2  1
170  1  0  0  0
170  1  0  1  1
170  1  0  2  1
171  1  0  0  0
171  1  0  1  1
171  1  0  2  1
172  1  0  0  0
172  1  0  1  1
172  1  0  2  1
173  1  0  0  0
173  1  0  1  1
173  1  0  2  1
174  1  0  0  0
174  1  0  1  1
174  1  0  2  1
175  1  0  0  0
175  1  0  1  1
175  1  0  2  1
176  1  0  0  0
176  1  0  1  1
176  1  0  2  0
177  1  0  0  0
177  1  0  1  1
177  1  0  2  0
178  1  0  0  0
178  1  0  1  1
178  1  0  2  0
179  1  0  0  0
179  1  0  1  1
179  1  0  2  0
180  1  0  0  0
180  1  0  1  1
180  1  0  2  0
181  1  0  0  0
181  1  0  1  1
181  1  0  2  0
182  1  0  0  0
182  1  0  1  1
182  1  0  2  0
183  1  0  0  0
183  1  0  1  1
183  1  0  2  0
184  1  0  0  0
184  1  0  1  1
184  1  0  2  0
185  1  0  0  0
185  1  0  1  1
185  1  0  2  0
186  1  0  0  0
186  1  0  1  1
186  1  0  2  0
187  1  0  0  0
187  1  0  1  1
187  1  0  2  0
188  1  0  0  0
188  1  0  1  1
188  1  0  2  0
189  1  0  0  0
189  1  0  1  1
189  1  0  2  0
190  1  0  0  0
190  1  0  1  1
190  1  0  2  0
191  1  0  0  0
191  1  0  1  0
191  1  0  2  1
192  1  0  0  0
192  1  0  1  0
192  1  0  2  1
193  1  0  0  0
193  1  0  1  0
193  1  0  2  1
194  1  0  0  0
194  1  0  1  0
194  1  0  2  1
195  1  0  0  0
195  1  0  1  0
195  1  0  2  1
196  1  0  0  0
196  1  0  1  0
196  1  0  2  1
197  1  0  0  0
197  1  0  1  0
197  1  0  2  1
198  1  0  0  0
198  1  0  1  0
198  1  0  2  1
199  1  0  0  0
199  1  0  1  0
199  1  0  2  1
200  1  0  0  0
200  1  0  1  0
200  1  0  2  1
201  1  0  0  0
201  1  0  1  0
201  1  0  2  1
202  1  0  0  0
202  1  0  1  0
202  1  0  2  1
203  1  0  0  0
203  1  0  1  0
203  1  0  2  1
204  1  0  0  0
204  1  0  1  0
204  1  0  2  1
205  1  0  0  0
205  1  0  1  0
205  1  0  2  1
206  1  0  0  0
206  1  0  1  0
206  1  0  2  1
207  1  0  0  0
207  1  0  1  0
207  1  0  2  1
208  1  0  0  0
208  1  0  1  0
208  1  0  2  1
209  1  0  0  0
209  1  0  1  0
209  1  0  2  1
210  1  0  0  0
210  1  0  1  0
210  1  0  2  1
211  1  0  0  0
211  1  0  1  0
211  1  0  2  1
212  1  0  0  0
212  1  0  1  0
212  1  0  2  1
213  1  0  0  0
213  1  0  1  0
213  1  0  2  1
214  1  0  0  0
214  1  0  1  0
214  1  0  2  1
215  1  0  0  0
215  1  0  1  0
215  1  0  2  1
216  1  0  0  0
216  1  0  1  0
216  1  0  2  1
217  1  0  0  0
217  1  0  1  0
217  1  0  2  1
218  1  0  0  0
218  1  0  1  0
218  1  0  2  0
219  1  0  0  0
219  1  0  1  0
219  1  0  2  0
220  1  0  0  0
220  1  0  1  0
220  1  0  2  0
221  1  0  0  0
221  1  0  1  0
221  1  0  2  0
222  1  0  0  0
222  1  0  1  0
222  1  0  2  0
223  1  0  0  0
223  1  0  1  0
223  1  0  2  0
224  1  0  0  0
224  1  0  1  0
224  1  0  2  0
225  1  0  0  0
225  1  0  1  0
225  1  0  2  0
226  1  0  0  0
226  1  0  1  0
226  1  0  2  0
227  1  0  0  0
227  1  0  1  0
227  1  0  2  0
228  1  0  0  0
228  1  0  1  0
228  1  0  2  0
229  1  0  0  0
229  1  0  1  0
229  1  0  2  0
230  1  0  0  0
230  1  0  1  0
230  1  0  2  0
231  1  0  0  0
231  1  0  1  0
231  1  0  2  0
232  1  0  0  0
232  1  0  1  0
232  1  0  2  0
233  1  0  0  0
233  1  0  1  0
233  1  0  2  0
234  1  0  0  0
234  1  0  1  0
234  1  0  2  0
235  1  0  0  0
235  1  0  1  0
235  1  0  2  0
236  1  0  0  0
236  1  0  1  0
236  1  0  2  0
237  1  0  0  0
237  1  0  1  0
237  1  0  2  0
238  1  0  0  0
238  1  0  1  0
238  1  0  2  0
239  1  0  0  0
239  1  0  1  0
239  1  0  2  0
240  1  0  0  0
240  1  0  1  0
240  1  0  2  0
241  1  0  0  0
241  1  0  1  0
241  1  0  2  0
242  1  0  0  0
242  1  0  1  0
242  1  0  2  0
243  1  0  0  0
243  1  0  1  0
243  1  0  2  0
244  1  0  0  0
244  1  0  1  0
244  1  0  2  0
245  1  0  0  0
245  1  0  1  0
245  1  0  2  0

246  1  1  0  1
246  1  1  1  1
246  1  1  2  1
247  1  1  0  1
247  1  1  1  1
247  1  1  2  1
248  1  1  0  1
248  1  1  1  1
248  1  1  2  1
249  1  1  0  1
249  1  1  1  1
249  1  1  2  1
250  1  1  0  1
250  1  1  1  1
250  1  1  2  1
251  1  1  0  1
251  1  1  1  1
251  1  1  2  1
252  1  1  0  1
252  1  1  1  1
252  1  1  2  1
253  1  1  0  1
253  1  1  1  1
253  1  1  2  0
254  1  1  0  1
254  1  1  1  1
254  1  1  2  0
255  1  1  0  1
255  1  1  1  0
255  1  1  2  1
256  1  1  0  1
256  1  1  1  0
256  1  1  2  1
257  1  1  0  1
257  1  1  1  0
257  1  1  2  1
258  1  1  0  1
258  1  1  1  0
258  1  1  2  1
259  1  1  0  1
259  1  1  1  0
259  1  1  2  1
260  1  1  0  1
260  1  1  1  0
260  1  1  2  0
261  1  1  0  1
261  1  1  1  0
261  1  1  2  0
262  1  1  0  0
262  1  1  1  1
262  1  1  2  1
263  1  1  0  0
263  1  1  1  1
263  1  1  2  1
264  1  1  0  0
264  1  1  1  1
264  1  1  2  1
265  1  1  0  0
265  1  1  1  1
265  1  1  2  1
266  1  1  0  0
266  1  1  1  1
266  1  1  2  1
267  1  1  0  0
267  1  1  1  1
267  1  1  2  1
268  1  1  0  0
268  1  1  1  1
268  1  1  2  1
269  1  1  0  0
269  1  1  1  1
269  1  1  2  1
270  1  1  0  0
270  1  1  1  1
270  1  1  2  1
271  1  1  0  0
271  1  1  1  1
271  1  1  2  1
272  1  1  0  0
272  1  1  1  1
272  1  1  2  1
273  1  1  0  0
273  1  1  1  1
273  1  1  2  1
274  1  1  0  0
274  1  1  1  1
274  1  1  2  1
275  1  1  0  0
275  1  1  1  1
275  1  1  2  1
276  1  1  0  0
276  1  1  1  1
276  1  1  2  1
277  1  1  0  0
277  1  1  1  1
277  1  1  2  1
278  1  1  0  0
278  1  1  1  1
278  1  1  2  1
279  1  1  0  0
279  1  1  1  1
279  1  1  2  1
280  1  1  0  0
280  1  1  1  1
280  1  1  2  1
281  1  1  0  0
281  1  1  1  1
281  1  1  2  1
282  1  1  0  0
282  1  1  1  1
282  1  1  2  1
283  1  1  0  0
283  1  1  1  1
283  1  1  2  1
284  1  1  0  0
284  1  1  1  1
284  1  1  2  1
285  1  1  0  0
285  1  1  1  1
285  1  1  2  1
286  1  1  0  0
286  1  1  1  1
286  1  1  2  1
287  1  1  0  0
287  1  1  1  1
287  1  1  2  1
288  1  1  0  0
288  1  1  1  1
288  1  1  2  1
289  1  1  0  0
289  1  1  1  1
289  1  1  2  1
290  1  1  0  0
290  1  1  1  1
290  1  1  2  1
291  1  1  0  0
291  1  1  1  1
291  1  1  2  1
292  1  1  0  0
292  1  1  1  1
292  1  1  2  1
293  1  1  0  0
293  1  1  1  1
293  1  1  2  0
294  1  1  0  0
294  1  1  1  1
294  1  1  2  0
295  1  1  0  0
295  1  1  1  1
295  1  1  2  0
296  1  1  0  0
296  1  1  1  1
296  1  1  2  0
297  1  1  0  0
297  1  1  1  1
297  1  1  2  0
298  1  1  0  0
298  1  1  1  0
298  1  1  2  1
299  1  1  0  0
299  1  1  1  0
299  1  1  2  1
300  1  1  0  0
300  1  1  1  0
300  1  1  2  1
301  1  1  0  0
301  1  1  1  0
301  1  1  2  1
302  1  1  0  0
302  1  1  1  0
302  1  1  2  1
303  1  1  0  0
303  1  1  1  0
303  1  1  2  1
304  1  1  0  0
304  1  1  1  0
304  1  1  2  1
305  1  1  0  0
305  1  1  1  0
305  1  1  2  1
306  1  1  0  0
306  1  1  1  0
306  1  1  2  1
307  1  1  0  0
307  1  1  1  0
307  1  1  2  1
308  1  1  0  0
308  1  1  1  0
308  1  1  2  1
309  1  1  0  0
309  1  1  1  0
309  1  1  2  1
310  1  1  0  0
310  1  1  1  0
310  1  1  2  1
311  1  1  0  0
311  1  1  1  0
311  1  1  2  1
312  1  1  0  0
312  1  1  1  0
312  1  1  2  1
313  1  1  0  0
313  1  1  1  0
313  1  1  2  1
314  1  1  0  0
314  1  1  1  0
314  1  1  2  1
315  1  1  0  0
315  1  1  1  0
315  1  1  2  1
316  1  1  0  0
316  1  1  1  0
316  1  1  2  1
317  1  1  0  0
317  1  1  1  0
317  1  1  2  1
318  1  1  0  0
318  1  1  1  0
318  1  1  2  1
319  1  1  0  0
319  1  1  1  0
319  1  1  2  1
320  1  1  0  0
320  1  1  1  0
320  1  1  2  1
321  1  1  0  0
321  1  1  1  0
321  1  1  2  1
322  1  1  0  0
322  1  1  1  0
322  1  1  2  1
323  1  1  0  0
323  1  1  1  0
323  1  1  2  1
324  1  1  0  0
324  1  1  1  0
324  1  1  2  1
325  1  1  0  0
325  1  1  1  0
325  1  1  2  1
326  1  1  0  0
326  1  1  1  0
326  1  1  2  1
327  1  1  0  0
327  1  1  1  0
327  1  1  2  1
328  1  1  0  0
328  1  1  1  0
328  1  1  2  1
329  1  1  0  0
329  1  1  1  0
329  1  1  2  1
330  1  1  0  0
330  1  1  1  0
330  1  1  2  0
331  1  1  0  0
331  1  1  1  0
331  1  1  2  0
332  1  1  0  0
332  1  1  1  0
332  1  1  2  0
333  1  1  0  0
333  1  1  1  0
333  1  1  2  0
334  1  1  0  0
334  1  1  1  0
334  1  1  2  0
335  1  1  0  0
335  1  1  1  0
335  1  1  2  0


23. Insomnia data set of Table 12.3


input case treat occasion outcome count;
datalines;
1       1     0        1   7
1       1     1        1   7
2       1     0        1   7
2       1     1        1   7
3       1     0        1   7
3       1     1        1   7
4       1     0        1   7
4       1     1        1   7
5       1     0        1   7
5       1     1        1   7
6       1     0        1   7
6       1     1        1   7
7       1     0        1   7
7       1     1        1   7
8       1     0        1   4
8       1     1        2   4
9       1     0        1   4
9       1     1        2   4
10       1     0        1   4
10       1     1        2   4
11       1     0        1   4
11       1     1        2   4
12       1     0        1   1
12       1     1        3   1
13       1     0        2  11
13       1     1        1  11
14       1     0        2  11
14       1     1        1  11
15       1     0        2  11
15       1     1        1  11
16       1     0        2  11
16       1     1        1  11
17       1     0        2  11
17       1     1        1  11
18       1     0        2  11
18       1     1        1  11
19       1     0        2  11
19       1     1        1  11
20       1     0        2  11
20       1     1        1  11
21       1     0        2  11
21       1     1        1  11
22       1     0        2  11
22       1     1        1  11
23       1     0        2  11
23       1     1        1  11
24       1     0        2   5
24       1     1        2   5
25       1     0        2   5
25       1     1        2   5
26       1     0        2   5
26       1     1        2   5
27       1     0        2   5
27       1     1        2   5
28       1     0        2   5
28       1     1        2   5
29       1     0        2   2
29       1     1        3   2
30       1     0        2   2
30       1     1        3   2
31       1     0        2   2
31       1     1        4   2
32       1     0        2   2
32       1     1        4   2
33       1     0        3  13
33       1     1        1  13
34       1     0        3  13
34       1     1        1  13
35       1     0        3  13
35       1     1        1  13
36       1     0        3  13
36       1     1        1  13
37       1     0        3  13
37       1     1        1  13
38       1     0        3  13
38       1     1        1  13
39       1     0        3  13
39       1     1        1  13
40       1     0        3  13
40       1     1        1  13
41       1     0        3  13
41       1     1        1  13
42       1     0        3  13
42       1     1        1  13
43       1     0        3  13
43       1     1        1  13
44       1     0        3  13
44       1     1        1  13
45       1     0        3  13
45       1     1        1  13
46       1     0        3  23
46       1     1        2  23
47       1     0        3  23
47       1     1        2  23
48       1     0        3  23
48       1     1        2  23
49       1     0        3  23
49       1     1        2  23
50       1     0        3  23
50       1     1        2  23
51       1     0        3  23
51       1     1        2  23
52       1     0        3  23
52       1     1        2  23
53       1     0        3  23
53       1     1        2  23
54       1     0        3  23
54       1     1        2  23
55       1     0        3  23
55       1     1        2  23
56       1     0        3  23
56       1     1        2  23
57       1     0        3  23
57       1     1        2  23
58       1     0        3  23
58       1     1        2  23
59       1     0        3  23
59       1     1        2  23
60       1     0        3  23
60       1     1        2  23
61       1     0        3  23
61       1     1        2  23
62       1     0        3  23
62       1     1        2  23
63       1     0        3  23
63       1     1        2  23
64       1     0        3  23
64       1     1        2  23
65       1     0        3  23
65       1     1        2  23
66       1     0        3  23
66       1     1        2  23
67       1     0        3  23
67       1     1        2  23
68       1     0        3  23
68       1     1        2  23
69       1     0        3   3
69       1     1        3   3
70       1     0        3   3
70       1     1        3   3
71       1     0        3   3
71       1     1        3   3
72       1     0        3   1
72       1     1        4   1
73       1     0        4   9
73       1     1        1   9
74       1     0        4   9
74       1     1        1   9
75       1     0        4   9
75       1     1        1   9
76       1     0        4   9
76       1     1        1   9
77       1     0        4   9
77       1     1        1   9
78       1     0        4   9
78       1     1        1   9
79       1     0        4   9
79       1     1        1   9
80       1     0        4   9
80       1     1        1   9
81       1     0        4   9
81       1     1        1   9
82       1     0        4  17
82       1     1        2  17
83       1     0        4  17
83       1     1        2  17
84       1     0        4  17
84       1     1        2  17
85       1     0        4  17
85       1     1        2  17
86       1     0        4  17
86       1     1        2  17
87       1     0        4  17
87       1     1        2  17
88       1     0        4  17
88       1     1        2  17
89       1     0        4  17
89       1     1        2  17
90       1     0        4  17
90       1     1        2  17
91       1     0        4  17
91       1     1        2  17
92       1     0        4  17
92       1     1        2  17
93       1     0        4  17
93       1     1        2  17
94       1     0        4  17
94       1     1        2  17
95       1     0        4  17
95       1     1        2  17
96       1     0        4  17
96       1     1        2  17
97       1     0        4  17
97       1     1        2  17
98       1     0        4  17
98       1     1        2  17
99       1     0        4  13
99       1     1        3  13
100       1     0        4  13
100       1     1        3  13
101       1     0        4  13
101       1     1        3  13
102       1     0        4  13
102       1     1        3  13
103       1     0        4  13
103       1     1        3  13
104       1     0        4  13
104       1     1        3  13
105       1     0        4  13
105       1     1        3  13
106       1     0        4  13
106       1     1        3  13
107       1     0        4  13
107       1     1        3  13
108       1     0        4  13
108       1     1        3  13
109       1     0        4  13
109       1     1        3  13
110       1     0        4  13
110       1     1        3  13
111       1     0        4  13
111       1     1        3  13
112       1     0        4   8
112       1     1        4   8
113       1     0        4   8
113       1     1        4   8
114       1     0        4   8
114       1     1        4   8
115       1     0        4   8
115       1     1        4   8
116       1     0        4   8
116       1     1        4   8
117       1     0        4   8
117       1     1        4   8
118       1     0        4   8
118       1     1        4   8
119       1     0        4   8
119       1     1        4   8
120       0     0        1   7
120       0     1        1   7
121       0     0        1   7
121       0     1        1   7
122       0     0        1   7
122       0     1        1   7
123       0     0        1   7
123       0     1        1   7
124       0     0        1   7
124       0     1        1   7
125       0     0        1   7
125       0     1        1   7
126       0     0        1   7
126       0     1        1   7
128       0     0        1   4
128       0     1        2   4
129       0     0        1   4
129       0     1        2   4
130       0     0        1   4
130       0     1        2   4
131       0     0        1   4
131       0     1        2   4
132       0     0        1   2
132       0     1        3   2
133       0     0        1   2
133       0     1        3   2
134       0     0        1   1
134       0     1        4   1
135       0     0        2  14
135       0     1        1  14
136       0     0        2  14
136       0     1        1  14
137       0     0        2  14
137       0     1        1  14
138       0     0        2  14
138       0     1        1  14
139       0     0        2  14
139       0     1        1  14
140       0     0        2  14
140       0     1        1  14
141       0     0        2  14
141       0     1        1  14
142       0     0        2  14
142       0     1        1  14
143       0     0        2  14
143       0     1        1  14
144       0     0        2  14
144       0     1        1  14
145       0     0        2  14
145       0     1        1  14
146       0     0        2  14
146       0     1        1  14
147       0     0        2  14
147       0     1        1  14
148       0     0        2  14
148       0     1        1  14
149       0     0        2   5
149       0     1        2   5
150       0     0        2   5
150       0     1        2   5
151       0     0        2   5
151       0     1        2   5
152       0     0        2   5
152       0     1        2   5
153       0     0        2   5
153       0     1        2   5
154       0     0        2   1
154       0     1        3   1
155       0     0        3   6
155       0     1        1   6
156       0     0        3   6
156       0     1        1   6
157       0     0        3   6
157       0     1        1   6
158       0     0        3   6
158       0     1        1   6
159       0     0        3   6
159       0     1        1   6
160       0     0        3   6
160       0     1        1   6
161       0     0        3   9
161       0     1        2   9
162       0     0        3   9
162       0     1        2   9
163       0     0        3   9
163       0     1        2   9
164       0     0        3   9
164       0     1        2   9
165       0     0        3   9
165       0     1        2   9
166       0     0        3   9
166       0     1        2   9
167       0     0        3   9
167       0     1        2   9
168       0     0        3   9
168       0     1        2   9
169       0     0        3   9
169       0     1        2   9
170       0     0        3  18
170       0     1        3  18
171       0     0        3  18
171       0     1        3  18
172       0     0        3  18
172       0     1        3  18
173       0     0        3  18
173       0     1        3  18
174       0     0        3  18
174       0     1        3  18
175       0     0        3  18
175       0     1        3  18
176       0     0        3  18
176       0     1        3  18
177       0     0        3  18
177       0     1        3  18
178       0     0        3  18
178       0     1        3  18
179       0     0        3  18
179       0     1        3  18
180       0     0        3  18
180       0     1        3  18
181       0     0        3  18
181       0     1        3  18
182       0     0        3  18
182       0     1        3  18
183       0     0        3  18
183       0     1        3  18
184       0     0        3  18
184       0     1        3  18
185       0     0        3  18
185       0     1        3  18
186       0     0        3  18
186       0     1        3  18
187       0     0        3  18
187       0     1        3  18
188       0     0        3   2
188       0     1        4   2
189       0     0        3   2
189       0     1        4   2
190       0     0        4   4
190       0     1        1   4
191       0     0        4   4
191       0     1        1   4
192       0     0        4   4
192       0     1        1   4
193       0     0        4   4
193       0     1        1   4
194       0     0        4  11
194       0     1        2  11
195       0     0        4  11
195       0     1        2  11
196       0     0        4  11
196       0     1        2  11
197       0     0        4  11
197       0     1        2  11
198       0     0        4  11
198       0     1        2  11
199       0     0        4  11
199       0     1        2  11
200       0     0        4  11
200       0     1        2  11
201       0     0        4  11
201       0     1        2  11
202       0     0        4  11
202       0     1        2  11
203       0     0        4  11
203       0     1        2  11
204       0     0        4  11
204       0     1        2  11
205       0     0        4  14
205       0     1        3  14
206       0     0        4  14
206       0     1        3  14
207       0     0        4  14
207       0     1        3  14
208       0     0        4  14
208       0     1        3  14
209       0     0        4  14
209       0     1        3  14
210       0     0        4  14
210       0     1        3  14
211       0     0        4  14
211       0     1        3  14
212       0     0        4  14
212       0     1        3  14
213       0     0        4  14
213       0     1        3  14
214       0     0        4  14
214       0     1        3  14
215       0     0        4  14
215       0     1        3  14
216       0     0        4  14
216       0     1        3  14
217       0     0        4  14
217       0     1        3  14
218       0     0        4  14
218       0     1        3  14
219       0     0        4  22
219       0     1        4  22
220       0     0        4  22
220       0     1        4  22
221       0     0        4  22
221       0     1        4  22
222       0     0        4  22
222       0     1        4  22
223       0     0        4  22
223       0     1        4  22
224       0     0        4  22
224       0     1        4  22
225       0     0        4  22
225       0     1        4  22
226       0     0        4  22
226       0     1        4  22
227       0     0        4  22
227       0     1        4  22
228       0     0        4  22
228       0     1        4  22
229       0     0        4  22
229       0     1        4  22
230       0     0        4  22
230       0     1        4  22
231       0     0        4  22
231       0     1        4  22
232       0     0        4  22
232       0     1        4  22
233       0     0        4  22
233       0     1        4  22
234       0     0        4  22
234       0     1        4  22
235       0     0        4  22
235       0     1        4  22
236       0     0        4  22
236       0     1        4  22
237       0     0        4  22
237       0     1        4  22
238       0     0        4  22
238       0     1        4  22
239       0     0        4  22
239       0     1        4  22
127       0     0        4  22
127       0     1        4  22


24. Presidential election poll data set of Table 13.2


state pi n      x proportion
AK .379  5      3  0.6000000
AL .387 29      9  0.3103448
AR .389 17      2  0.1176471
AZ .449 35     13  0.3714286
CA .609 207   129  0.6231884
CO .537 37     16  0.4324324
CT .606 25     14  0.5600000
DC .925  4      4  1.0000000
DE .619  6      4  0.6666667
FL .509 128    73  0.5703125
GA .469 60     27  0.4500000
HI .718 7       6  0.8571429
IA .539 23      9  0.3913043
ID .359 10      1  0.1000000
IL .618 84     45  0.5357143
IN .498 42     20  0.4761905
KS .415 19      8  0.4210526
KY .412 28     10  0.3571429
LA .399 30     11  0.3666667
MA .618 47     20  0.4255319
MD .619 40     29  0.7250000
ME .577 11      9  0.8181818
MI .573 76     42  0.5526316
MN .541 44     22  0.5000000
MO .492 45     25  0.5555556
MS .430 20     11  0.5500000
MT .471 7       3  0.4285714
NC .497 66     23  0.3484848
ND .445 5       2  0.4000000
NE .416 12     10  0.8333333
NH .541 11      3  0.2727273
NJ .571 59     32  0.5423729
NM .569 13      7  0.5384615
NV .552 15      7  0.4666667
NY .629 116    77  0.6637931
OH .514 87     48  0.5517241
OK .344 22      4  0.1818182
OR .568 28     14  0.5000000
PA .545 92     53  0.5760870
RI .629 7       6  0.8571429
SC .449 29      9  0.3103448
SD .448 6       4  0.6666667
TN .418 40     17  0.4250000
TX .436 123    61  0.4959350
UT .342 15     10  0.6666667
VA .526 57     28  0.4912281
VT .675 5       4  0.8000000
WA .573 46     31  0.6739130
WI .562 45     26  0.5777778
WV .425 11      6  0.5454545
WY .325 4       2  0.5000000


25. Attitudes about abortion data set of Table 13.3, with data shown at the individual level


gender response question case
1        1        1    1
1        1        2    1
1        1        3    1
1        1        1    2
1        1        2    2
1        1        3    2
1        1        1    3
1        1        2    3
1        1        3    3
1        1        1    4
1        1        2    4
1        1        3    4
1        1        1    5
1        1        2    5
1        1        3    5
1        1        1    6
1        1        2    6
1        1        3    6
1        1        1    7
1        1        2    7
1        1        3    7
1        1        1    8
1        1        2    8
1        1        3    8
1        1        1    9
1        1        2    9
1        1        3    9
1        1        1   10
1        1        2   10
1        1        3   10
1        1        1   11
1        1        2   11
1        1        3   11
1        1        1   12
1        1        2   12
1        1        3   12
1        1        1   13
1        1        2   13
1        1        3   13
1        1        1   14
1        1        2   14
1        1        3   14
1        1        1   15
1        1        2   15
1        1        3   15
1        1        1   16
1        1        2   16
1        1        3   16
1        1        1   17
1        1        2   17
1        1        3   17
1        1        1   18
1        1        2   18
1        1        3   18
1        1        1   19
1        1        2   19
1        1        3   19
1        1        1   20
1        1        2   20
1        1        3   20
1        1        1   21
1        1        2   21
1        1        3   21
1        1        1   22
1        1        2   22
1        1        3   22
1        1        1   23
1        1        2   23
1        1        3   23
1        1        1   24
1        1        2   24
1        1        3   24
1        1        1   25
1        1        2   25
1        1        3   25
1        1        1   26
1        1        2   26
1        1        3   26
1        1        1   27
1        1        2   27
1        1        3   27
1        1        1   28
1        1        2   28
1        1        3   28
1        1        1   29
1        1        2   29
1        1        3   29
1        1        1   30
1        1        2   30
1        1        3   30
1        1        1   31
1        1        2   31
1        1        3   31
1        1        1   32
1        1        2   32
1        1        3   32
1        1        1   33
1        1        2   33
1        1        3   33
1        1        1   34
1        1        2   34
1        1        3   34
1        1        1   35
1        1        2   35
1        1        3   35
1        1        1   36
1        1        2   36
1        1        3   36
1        1        1   37
1        1        2   37
1        1        3   37
1        1        1   38
1        1        2   38
1        1        3   38
1        1        1   39
1        1        2   39
1        1        3   39
1        1        1   40
1        1        2   40
1        1        3   40
1        1        1   41
1        1        2   41
1        1        3   41
1        1        1   42
1        1        2   42
1        1        3   42
1        1        1   43
1        1        2   43
1        1        3   43
1        1        1   44
1        1        2   44
1        1        3   44
1        1        1   45
1        1        2   45
1        1        3   45
1        1        1   46
1        1        2   46
1        1        3   46
1        1        1   47
1        1        2   47
1        1        3   47
1        1        1   48
1        1        2   48
1        1        3   48
1        1        1   49
1        1        2   49
1        1        3   49
1        1        1   50
1        1        2   50
1        1        3   50
1        1        1   51
1        1        2   51
1        1        3   51
1        1        1   52
1        1        2   52
1        1        3   52
1        1        1   53
1        1        2   53
1        1        3   53
1        1        1   54
1        1        2   54
1        1        3   54
1        1        1   55
1        1        2   55
1        1        3   55
1        1        1   56
1        1        2   56
1        1        3   56
1        1        1   57
1        1        2   57
1        1        3   57
1        1        1   58
1        1        2   58
1        1        3   58
1        1        1   59
1        1        2   59
1        1        3   59
1        1        1   60
1        1        2   60
1        1        3   60
1        1        1   61
1        1        2   61
1        1        3   61
1        1        1   62
1        1        2   62
1        1        3   62
1        1        1   63
1        1        2   63
1        1        3   63
1        1        1   64
1        1        2   64
1        1        3   64
1        1        1   65
1        1        2   65
1        1        3   65
1        1        1   66
1        1        2   66
1        1        3   66
1        1        1   67
1        1        2   67
1        1        3   67
1        1        1   68
1        1        2   68
1        1        3   68
1        1        1   69
1        1        2   69
1        1        3   69
1        1        1   70
1        1        2   70
1        1        3   70
1        1        1   71
1        1        2   71
1        1        3   71
1        1        1   72
1        1        2   72
1        1        3   72
1        1        1   73
1        1        2   73
1        1        3   73
1        1        1   74
1        1        2   74
1        1        3   74
1        1        1   75
1        1        2   75
1        1        3   75
1        1        1   76
1        1        2   76
1        1        3   76
1        1        1   77
1        1        2   77
1        1        3   77
1        1        1   78
1        1        2   78
1        1        3   78
1        1        1   79
1        1        2   79
1        1        3   79
1        1        1   80
1        1        2   80
1        1        3   80
1        1        1   81
1        1        2   81
1        1        3   81
1        1        1   82
1        1        2   82
1        1        3   82
1        1        1   83
1        1        2   83
1        1        3   83
1        1        1   84
1        1        2   84
1        1        3   84
1        1        1   85
1        1        2   85
1        1        3   85
1        1        1   86
1        1        2   86
1        1        3   86
1        1        1   87
1        1        2   87
1        1        3   87
1        1        1   88
1        1        2   88
1        1        3   88
1        1        1   89
1        1        2   89
1        1        3   89
1        1        1   90
1        1        2   90
1        1        3   90
1        1        1   91
1        1        2   91
1        1        3   91
1        1        1   92
1        1        2   92
1        1        3   92
1        1        1   93
1        1        2   93
1        1        3   93
1        1        1   94
1        1        2   94
1        1        3   94
1        1        1   95
1        1        2   95
1        1        3   95
1        1        1   96
1        1        2   96
1        1        3   96
1        1        1   97
1        1        2   97
1        1        3   97
1        1        1   98
1        1        2   98
1        1        3   98
1        1        1   99
1        1        2   99
1        1        3   99
1        1        1  100
1        1        2  100
1        1        3  100
1        1        1  101
1        1        2  101
1        1        3  101
1        1        1  102
1        1        2  102
1        1        3  102
1        1        1  103
1        1        2  103
1        1        3  103
1        1        1  104
1        1        2  104
1        1        3  104
1        1        1  105
1        1        2  105
1        1        3  105
1        1        1  106
1        1        2  106
1        1        3  106
1        1        1  107
1        1        2  107
1        1        3  107
1        1        1  108
1        1        2  108
1        1        3  108
1        1        1  109
1        1        2  109
1        1        3  109
1        1        1  110
1        1        2  110
1        1        3  110
1        1        1  111
1        1        2  111
1        1        3  111
1        1        1  112
1        1        2  112
1        1        3  112
1        1        1  113
1        1        2  113
1        1        3  113
1        1        1  114
1        1        2  114
1        1        3  114
1        1        1  115
1        1        2  115
1        1        3  115
1        1        1  116
1        1        2  116
1        1        3  116
1        1        1  117
1        1        2  117
1        1        3  117
1        1        1  118
1        1        2  118
1        1        3  118
1        1        1  119
1        1        2  119
1        1        3  119
1        1        1  120
1        1        2  120
1        1        3  120
1        1        1  121
1        1        2  121
1        1        3  121
1        1        1  122
1        1        2  122
1        1        3  122
1        1        1  123
1        1        2  123
1        1        3  123
1        1        1  124
1        1        2  124
1        1        3  124
1        1        1  125
1        1        2  125
1        1        3  125
1        1        1  126
1        1        2  126
1        1        3  126
1        1        1  127
1        1        2  127
1        1        3  127
1        1        1  128
1        1        2  128
1        1        3  128
1        1        1  129
1        1        2  129
1        1        3  129
1        1        1  130
1        1        2  130
1        1        3  130
1        1        1  131
1        1        2  131
1        1        3  131
1        1        1  132
1        1        2  132
22. Attitudes about abortion data set of Table 13.3, in contigency table form


gender poor single any count
1 1 1 1 342
1 1 1 0 26
1 1 0 1 11
1 1 0 0 32
1 0 1 1 6
1 0 1 0 21
1 0 0 1 19
1 0 0 0 356
2 1 1 1 440
2 1 1 0 25
2 1 0 1 14
2 1 0 0 47
2 0 1 1 14
2 0 1 0 18
2 0 0 1 22
2 0 0 0 457


26. Attitudes toward leading crowd data set of Table 13.8


mem1 att1 mem2 att2 count
1 1 1 1 458
1 1 1 0 140
1 1 0 1 110
1 1 0 0 49
1 0 1 1 171
1 0 1 0 182
1 0 0 1 56
1 0 0 0 87
0 1 1 1 184
0 1 1 0 75
0 1 0 1 531
0 1 0 0 281
0 0 1 1 85
0 0 1 0 97
0 0 0 1 338
0 0 0 0 554


27. Data for example in Section 13.4.4 on cluster sampling


nbhd satis_1 satis_2
1 1 1
1 2 1
1 2 1
1 2 2
1 2 2
2 1 1
2 2 1
2 2 1
2 2 2
2 2 2
3 1 2
3 1 2
3 2 2
3 2 2
3 3 2
4 1 2
4 2 1
4 2 1
4 2 2
4 3 1
5 2 2
5 2 2
5 2 2
5 2 2
5 3 2
6 1 1
6 2 1
6 2 1
6 2 1
6 2 2
7 1 1
7 1 1
7 1 1
7 2 2
7 3 2
8 1 1
8 2 1
8 2 2
8 2 2
8 2 2
9 1 1
9 1 1
9 1 1
9 3 1
9 3 3
10 1 2
10 2 2
10 2 2
10 2 2
10 2 3
11 1 1
11 1 2
11 2 2
11 2 2
11 3 1
12 1 2
12 2 1
12 2 1
12 2 1
12 2 1
13 2 1
13 2 1
13 2 1
13 2 1
13 2 2
14 2 1
14 2 2
14 2 2
14 3 3
14 3 3
15 1 1
15 1 1
15 2 1
15 2 1
15 2 2
16 1 1
16 2 1
16 2 2
17 2 1
17 2 2
17 2 3
17 3 2
17 3 2
18 2 1
18 2 3
18 3 3
19 1 1
19 1 1
19 2 1
19 2 1
19 2 2
20 1 1
20 1 1
20 2 1
20 2 1
20 3 1


28. Clinical trials data set for Exercise 13.17


Center  Treatment   Much_Better  Better  Unchanged/Worse
1  Drug      13  7  6
1   Placebo    1  1   10
2  Drug       2  5   10
2   Placebo    2  2   1
3  Drug      11  23  7
3   Placebo    2  8   2
4  Drug       7  11  8
4   Placebo    0  3   2
5  Drug      15  3   5
5   Placebo    1  1   5
6  Drug      13  5   5
6   Placebo    4  0   1
7  Drug       7  4   13
7   Placebo    1  1   11
8  Drug      15  9   2
8   Placebo    3  2   2


29. Data for Exercise 14.15 on Buchanan vote in 2000


county      perot      total     buchanan    total
vote     vote 1996     vote     vote 2000

Alachua      8072      74484       262        84839
Baker         667       6634        73         8128
Bay          5922      51566       248        58563
Brevard     25249     195055       570       217543
Broward     38964     505015       789       571685
Calhoun       630       4158        90         5157
Charlott     7783      63014       182        66715
Citrus       7244      49585       270        56940
Clay         3281      47040       186        57116
Collier      6320      72511       122        91873
Columbia     1970      16326        89        18358
Desoto        965       7485        36         7771
Dixie         652       3795        29         4627
Duval       13844     253943       652       263371
Escambia     8587     107687       504       116220
Flagler      2185      20075        83        27017
Franklin      878       4569        33         4618
Gilchris      841       4795        29         5336
Gulf         1054       5986        71         6104
Hamilton      406       3670        23         3928
Hardee        851       6204        30         6210
Hendry       1135       8896        22         8112
Hernando     7272      58055       242        65033
Highland     3739      33699       127        35045
Hillsbor    25154     308190       847       351913
Holmes       1208       6801        76         7306
IndianRi     4635      43963       105        49458
Jackson      1602      15509       102        16246
Jefferso      393       4808        29         5624
Lafayett      316       2322        10         2493
Lake         8813      73911       289        88266
Lee         18389     165923       305       183593
Leon         6672      91685       282       102692
Levy         1774      11065        67        12614
Liberty       376       1342        39         2385
Manatee     10360      96741       272       109878
Marion      11340      90146       563       102178
Martin       5005      54646       108        61666
Monroe       4817      32450        47        33679
Nassau       1657      21159        90        23502
Okaloosa     5432      62963       267        70293
Okeechob     1666       9936        43         9819
Orange      18191     231061       446       278918
Osceola      6091      46484       145        55270
PalmBeac    30739     397231      3407       430762
Pasco       18011     133457       570       142108
Pinellas    36990     376218      1010       396092
Polk        14991     150140       538       167676
Putnam       3272      25145       148        26074
SantaRos     4957      42336       311        50111
Sarasota    14939     148950       305       160327
Seminole     9357     114878       194       136315
StJohns      4205      48539       229        60494
StLucie      8482      73897       124        77756
Sumter       2375      15397       114        22184
Suwannee     1874      12144       108        12369
Taylor       1140       7997        27         6791
Union         425       3462        29         3800
Volusia     17319     160118       396       182109
Wakulla      1091       7165        46         8545
Walton       2342     15514        120       18209
Washingt     1287      7859         88        7960


30. Election data set of Table 15.5 (1 = Dem, 0 = Rep)


State  e1  e2  e3  e4  e5  e6  e7  e8
Alab 0 0 0 0 0 0 0 0
Alas 0 0 0 0 0 0 0 0
Ariz 0 0 0 0 1 0 0 0
Arka 0 0 0 1 1 0 0 0
Cali 0 0 0 1 1 1 1 1
Colo 0 0 0 1 0 0 0 1
Conn 0 0 0 1 1 1 1 1
Dela 0 0 0 1 1 1 1 1
DisC 1 1 1 1 1 1 1 1
Flor 0 0 0 0 1 0 0 1
Geor 1 0 0 1 0 0 0 0
Hawa 1 0 1 1 1 1 1 1
Idah 0 0 0 0 0 0 0 0
Illi 0 0 0 1 1 1 1 1
Indi 0 0 0 0 0 0 0 1
Iowa 0 0 1 1 1 1 0 1
Kans 0 0 0 0 0 0 0 0
Kent 0 0 0 1 1 0 0 0
Loui 0 0 0 1 1 0 0 0
Main 0 0 0 1 1 1 1 1
Mary 1 0 0 1 1 1 1 1
Mass 0 0 1 1 1 1 1 1
Mich 0 0 0 1 1 1 1 1
Minn 1 1 1 1 1 1 1 1
Miss 0 0 0 0 0 0 0 0
Miso 0 0 0 1 1 0 0 0
Mont 0 0 0 1 0 0 0 0
Nebr 0 0 0 0 0 0 0 0
Neva 0 0 0 1 1 0 0 1
NewH 0 0 0 1 1 0 1 1
NewJ 0 0 0 1 1 1 1 1
NewM 0 0 0 1 1 1 0 1
NewY 0 0 1 1 1 1 1 1
NorC 0 0 0 0 0 0 0 1
NorD 0 0 0 0 0 0 0 0
Ohio 0 0 0 1 1 0 0 1
Okla 0 0 0 0 0 0 0 0
Oreg 0 0 1 1 1 1 1 1
Penn 0 0 0 1 1 1 1 1
Rhod 1 0 1 1 1 1 1 1
SouC 0 0 0 0 0 0 0 0
SouD 0 0 0 0 0 0 0 0
Tenn 0 0 0 1 1 0 0 1
Texa 0 0 0 0 0 0 0 0
Utah 0 0 0 0 0 0 0 0
Verm 0 0 0 1 1 1 1 1
Virg 0 0 0 0 0 0 0 1
Wash 0 0 1 1 1 1 1 1
WesV 1 0 1 1 1 0 0 0
Wisc 0 0 1 1 1 1 1 1
Wyom 0 0 0 0 0 0 0 0


31. Grounds for divorce data set of Table 15.6 for Exercise 15.10


state incompat cruelty desertn non_supp alcohol felony impotenc insanity separate
Alabama  1 1 1 1 1 1 1 1 1
Alaska  1 1 1 0 1 1 1 1 0
Arizona  1 0 0 0 0 0 0 0 0
Arkansas  0 1 1 1 1 1 1 1 1
California  1 0 0 0 0 0 0 1 0
Colorado  1 0 0 0 0 0 0 0 0
Connecticut  1 1 1 1 1 1 0 1 1
Delaware  1 0 0 0 0 0 0 0 1
Florida  1 0 0 0 0 0 0 1 0
Georgia  1 1 1 0 1 1 1 1 0
Hawaii  1 0 0 0 0 0 0 0 1
Idaho  1 1 1 1 1 1 0 1 1
Illinois  0 1 1 0 1 1 1 0 0
Indiana  1 0 0 0 0 1 1 1 0
Iowa  1 0 0 0 0 0 0 0 0
Kansas  1 1 1 0 1 1 1 1 0
Kentucky  1 0 0 0 0 0 0 0 0
Louisiana  0 0 0 0 0 1 0 0 1
Maine  1 1 1 1 1 0 1 1 0
Maryland  0 1 1 0 0 1 1 1 1
Massachusetts  1 1 1 1 1 1 1 0 1
Michigan  1 0 0 0 0 0 0 0 0
Minnesota  1 0 0 0 0 0 0 0 0
Mississippi  1 1 1 0 1 1 1 1 0
Missouri  1 0 0 0 0 0 0 0 0
Montana  1 0 0 0 0 0 0 0 0
Nebraska  1 0 0 0 0 0 0 0 0
Nevada  1 0 0 0 0 0 0 1 1
NewHampshire  1 1 1 1 1 1 1 0 0
NewJersey  0 1 1 0 1 1 0 1 1
NewMexico  1 1 1 0 0 0 0 0 0
NewYork  0 1 1 0 0 1 0 0 1
NorthCarolina  0 0 0 0 0 0 1 1 1
NorthDakota  1 1 1 1 1 1 1 1 0
Ohio  1 1 1 0 1 1 1 0 1
Oklahoma  1 1 1 1 1 1 1 1 0
Oregon  1 0 0 0 0 0 0 0 0
Pennsylvania  0 1 1 0 0 1 1 1 0
RhodeIsland  1 1 1 1 1 1 1 0 1
SouthCarolina  0 1 1 0 1 0 0 0 1
SouthDakota  0 1 1 1 1 1 0 0 0
Tennessee  1 1 1 1 1 1 1 0 0
Texas  1 1 1 0 0 1 0 1 1
Utah  0 1 1 1 1 1 1 1 0
Vermont  0 1 1 1 0 1 0 1 1
Virginia  0 1 0 0 0 1 0 0 1
Washington  1 0 0 0 0 0 0 0 1
WestVirginia  1 1 1 0 1 1 0 1 1
Wisconsin  1 0 0 0 0 0 0 0 1
Wyoming  1 0 0 0 0 0 0 1 1


Next:

Alan Agresti 2001-12-27

## Thompson, LA – S-Plus and R manual to accompany Agresti (2002) – Ch 02

Before we get into logistic regression and related approaches to modeling categorical data, let’s examine some of the fundamental of modeling contingency tables.

Agresti (2002) Ch 01 covers distributions and inference for categorical data – we’re going to skip over this material.

Agresti (2002) Ch 02 – which introduces I ×J (two-way) contingency tables:

Row variable and column variable – marginal distribution and conditional distribution

If both the row and column of a table denote random variables, then the probabilities {πij} define the joint distribution of the two variables.  The marginal distributions are denoted by {πi+} for the row variable and {π+ j} for the column variable.  For a fixed value i of the row variable, the column variable has the conditional distribution1|i ,…,πJ i| }.  The conditional distribution is especially important if the row variable is fixed by design (i.e.  not free to vary for each observation).

Independence of row variable and column variable

Row and column variables are independent if the conditional distribution of the column variable given the row variable is the same as the marginal distribution of the column variable (and vice versa).  That is, πj i| + j for i = 1,…, I, and πi j| i+ j = 1,…, J.  Equivalently, if all joint probabilities equal the product of their marginal probabilities: πij =ππi+ + j , for all i and j.  Thus, when the two variables are independent, knowledge of the value of the row variable does not change the distribution of the column variable, and vice versa.

Independence of explanatory variable and response variable

When the row variable is an explanatory variable and the column is a response variable, then there is no joint distribution, and independence is referred to as homogeneity of the conditional distributions of the column variable given a value for the row variable.

Maybe a little esoteric:

Sampling scheme should determine the distribution of cell counts

The distributions of the cell counts {Yij} differ depending on how sampling was done.

• If observations are to be collected over a certain period of time and cross-classified into one of the I ×J categories, then a Poisson sampling model might be used where cell counts are treated as independent Poisson random variables with parameters {µij }.
• If the total sample size of observations is fixed in advance (e.g., in a cross-sectional study), then a multinomial sampling model might be used where cell counts are treated as multinomial random variables with index n and probabilities {πij}.
• If the row totals are fixed in advance, perhaps as fixed-size random samples drawn from specific populations that are to be compared, as in prospective studies, then a product-multinomial sampling model may apply where for each i, the counts {Yj i| } have a multinomial distribution with index ni and probabilities πj i| j = 1,…, J .
• If both row and column totals are fixed by design, then a hypergeometric sampling distribution applies for the cell counts.

Sampling scheme often does not determine the distribution of cell counts
However, there are times when certain sampling models are assumed, but sampling was actually done differently.  For example, when the row variable is an explanatory variable, product multinomial sampling model may be used even though the row totals were not actually fixed in advance.  Also, the Poisson model is used even when the total sample size is fixed in advance.

Section 2.2 discusses comparing two proportions from two samples, including the difference of proportions, relative risk, and odds ratio.

Without offering an explanation, odds ratio is the best comparison to use.

Odds ratio

The odds ratio is the ratio of odds of a positive response by group

θ= [π1|1/(1−π1|1)] / [π1|2/(1−π1|2)] = [π11π22]/[π1|2π2|1 ]

When θ= 1, the row and column variables are independent.  Values of θ farther from 1.0 in a given direction represent stronger association.  The odds ratio can be used with a joint distribution of the row and column variables too.  Indeed, it can be used with prospective (rows totals fixed), retrospective (column totals fixed), and cross-sectional designs.  Finally, if the rows and columns are interchanged, the value of the odds ratio does not change.  The sample odds ratio uses the observed sample counts, nij.

Confounding explanatory variables and conditional association

In observational studies, confounding variables can be controlled with stratification or conditioning.  The association between two variables X and Y given that another measured variable Z takes the value z is called a conditional association.  The 2 x 2 table resulting from cross-classifying all observations with Z = z by their X and Y values is called a partial table.  If Z is ignored, the X-Y table is called a marginal table.

Simpson’s Paradox is the result that a marginal association can have a different direction than a conditional association.  For example, in the death penalty example on p. 49-51, ignoring victim’s race, the death penalty (Y) is more likely for whites than for blacks (X).  However, conditioning on victim’s race (either black or white), the death penalty is more likely for blacks than for whites.  The paradox in this case can be explained by the strong association between victim’s race (ignored in the marginal association) and defendant’s race and that between victim’s race and the death penalty.  The death penalty was more likely when the victims were white (regardless of defendant race).  Also, whites were more likely to kill whites than any other victim/defendant race combination in the sample.  So, there are a lot of whites receiving the death penalty in the sample.  On the other hand, blacks were more likely to kill blacks.  Thus, there are fewer blacks receiving the death penalty in the sample.  But, if we look at only white victims, there are relatively more blacks receiving the death penalty than whites.  The same is true for black victims.  An unmodeled association between victim’s and defendant’s race hides this conclusion.

Does Simpson’s Paradox imply that we should distrust all contingency table analyses?  After all, there are undoubtedly unmeasured variables that could be potential conditioning variables in all contingency tables.  Could these variables change the direction of marginal associations?  Page 51 in Agresti paraphrases J. Cornfield’s result “that with a very strong XY association [marginal association], a very strong association must exist between the confounding variable Z and both X and Y in order for the effect to disappear or change …”.

Conditional independence

For I x J x K tables (where X has I levels, Y has J levels, and Z has K levels), if X and Y are independent in partial table k, then X and Y are conditionally independent given that Z takes on value k. If X and Y are independent at all levels of Z, then X and Y are conditionally independent given Z.

Conditional independence does not imply marginal independence.  For 2 x 2 x K tables, X and Y are conditionally independent given Z if the odds ratio between X and Y equals 1 at each category of Z.  For the general case of I x J x K tables, independence is equivalent to all (I −1)(J −1) local odds ratios equaling 1.0.

An analogy to no three-way interaction in ANOVA is homogeneous association.  A 2 x 2 x K table has homogeneous XY association if the conditional odds ratios comparing two categories of X to two categories of Y are the same at each level of Z.  When interaction exists, the conditional odds ratio for any pair of variables (say X and Y) changes across categories of the third (say Z), wherein the third variable is called an effect modifier because the effect of X on Y (the response) changes depending on the level of Z.  For the general case of I x J x K tables, homogeneous XY association means that any conditional odds ratio formed using two categories of X and two categories of Y is the same at each category of Z.

Summary measures of association – nominal data

• Kendall and Stuart’s measure of proportional reduction in variance from the marginal distribution of the response to the conditional distributions given the value of an explanatory vector; and
• Theil’s uncertainty coefficient – the proportional reduction in entropy (or uncertainty) in response given explanatory variables.

Summary measures of association – ordinal data

• concordance, and
• Gamma.

## The very basic basics of working with counted data

We’ll briefly define four related and often confused or conflated concepts in inferential statistics:

• vector of probabilities in a population
• vector of probabilities in a sample
• vector of expected counts in a sample
• vector of observed counts in a sample

For simplicity’s sake, we say that

• things may be described (at least partly) by their properties
• a property (like colour) is a set of classifications (like red, blue, green, etc.)1
• to know a thing (at least partly) is to be able to describe (at least partly) its properties

## Classification of things using one property

Now, let’s also say that we believe there’s  something to learn about a thing, e.g. we believe that the thing might have some property (zargness), so we want to study the thing.

In our view of things and their properties, every individual thing in the population – i.e. the whole or part of the “real world” – can be classified as falling into one and only one of t categories (into category1, category2, …, categoryt of zargness, in our study). These categories are, therefore, exhaustive and mutually exclusive.

We are ignorant of the distribution of zargness among things – we can’t nominate all the categories of zargness, let alone enumerate the things whose zargness falls into a particular category. Furthermore, there are too many things to observe and determine the zargness of every one. So, how can we (make a reasonable) study of the zargness of things?

Our way forward turns on the notion of random sampling from the population.  There’s an immense literature on random sampling, sampling schemes, and experimental design. The more you know about random sampling, the better off you are, to be sure. Suffice it to say, getting sampling wrong is the source of a lot of grief:

• researcher may be unclear about the population of interest – I say “of interest”, since the population depends (except in the trivial case of “every thing in the world”) on the researcher’s question
• even when the population is well-defined, the sample of things that the researcher selects for study may not have been selected  randomly – and so the sample does not provide a representative (or unbiased) view of the population of interest
• even when the sample is randomly selected, the researcher may have selected too few things – and so the sample is prone to chance effects that make for a distorted view of the population of interest

With some of the practical considerations of random sampling out of the way (!), let’s return to our example:

While we can’t nominate all the categories of zargness, and we can’t enumerate the things whose zargness falls into a particular category, we will assume that the distribution of the zargness of things (at a given point in time) can be represented as a vector of category probabilities in the population:

{℘i} = (℘1, ℘2, …, ℘t) [Eq. 1.0]

where the sum of all ℘i  is unity, or

$\sum\limits_{i=1}^t ℘_{i} = 1$.

For example, let’s say that (in reality) there are three categories of zargness – the zargness of fifty percent of things is of the first type,  the zargness of twenty percent of things is of the second type, and the zargness of thirty percent of things is of the third type:

{zargnessi} = (probability of zargness1, probability of zargness2, probability of zargness3), where

{zargnessi} = (0.5, 0.2, 0.3) [Eq. 1.1]

and

$\sum\limits_{i=1}^t ℘_{i} = 1$.

Of course, in most cases (and in our example), there are simply too many things to observe and classify the zargness of all of them – that is, we’re never able to move from our model of how zargness is distributed (Eq. 1) to knowing how zargness is “really” distributed (Eq. 1.1) in the population of things.

But we can close the gap between Eq. 1 and Eq. 1.1 (i.e. estimate the set of names of categories and their respective probabilities in Eq. 1.1) using random sampling:

If we select a single thing from the population at random, the zargness of the thing will fall into one of the t categories with probability pi, where pi is a function of the relative frequency (or probability) of each category of zargness among things in the population:

{pi} = (p1, p2, …, pt) [Eq. 1.3]

where

$\sum\limits_{i=1}^t p_{i} = 1$.

Note that the above vector of category probabilities in the sample that the zargness of a single thing selected at random will fall into categoryi reflects the vector of probabilities that describes the distribution of all categories of zargness among all things in the population.

So, let’s make a second selection of a thing from the population and determine that thing’s zargness. As soon as we’re looking at making more than one random selection,we must consider the sampling plan to be used to make these selections. The vector of probabilities (Eq. 1) governing the second selection is “the same” as the vector of probabilities for the first selection provided that one of two conditions applies:

• the thing that was selected when we made the first selection is “replaced” in the population before we make the second selection – sampling with replacement, or
• the thing that was selected when we made the first selection is not “replaced” in the population before we make the second selection – sampling without replacement – but the size of the population of things is infinitely large, and the removal of the first thing from the population makes no material difference in the vector of probabilities.

In other words,  if either of these conditions continues to apply as we take a turn at randomly selecting a thing from the population, the things remaining for selection from the population will continue to represent three categories of zargness – where fifty percent of things continues to have zargness of the first type, twenty percent of things continues to have zargness of the second type, and thirty percent of things continues to have zargness of the third type. That is, on every turn at randomly selecting a thing from the population,

{℘i} = (℘1, ℘2, …, ℘t) [after Eq. 1.0]

continues to be

{zargnessi} = (0.5, 0.2, 0.3) [after Eq. 1.1]

And so, provided that one of these two conditions applies as we take turns at randomly selecting things from the population, the zargness of the thing selected on any given turn will continue to fall into one of the t categories with probability pi, where pi is a function of the relative frequency (or probability) of each category of zargness among things in the population:

{pi} = (p1, p2, …, pt) [after Eq. 1.3]

If we take N turns at randomly selecting a thing from the population (as above), we obtain E(m1 ) = p1 x N expected number of things having zargness1, E(m2) = p2x N expected number of things having zargness2, etc. – yielding the following vector of expected category counts in the sample:

{mi} = (m1, m2, …, mt) [Eq. 1.4].

The vector of observed category counts in the sample {xt} is

{xi} = (x1, x2, …, xt) [Eq. 1.5],

where

$\sum\limits_{i=1}^t x_{i} = N$.

As the number of things that are selected randomly from the population increases, the differences between the observed category counts, {xi}, and the expected category counts, {mi}, in the sample decrease. As N approaches infinity – or at least, is “sufficiently large” – {xi} and {mi} contain the same category classifications and the same category counts.

This general approach can be extended when we are concerned with classifying things using two or more properties.

Conceptually, the chain of inference from our observation of the properties of things in a random sample to some understanding of the properties of things in the population of things is:

{xi} -> {mi} -> {pi} -> {℘i}

The “Frequentist” (aka traditional, standard) school of statistical thinking would say that the “weakest” link in this chain of inference is {xi} -> {mi} – and a great deal of its effort and ingenuity has gone in to developing various statistical methods and analytic techniques to estimate the expected counts {mi} from the observed counts {xi} for a variety of sampling schemes. The “Bayesian” school of statistical thinking takes a profoundly different view of how to approach the business of connecting the subjective and objective, observation and “reality”.

We’ll deal with the application of some of the differences between the Frequentist and Bayesian frameworks in the context of a few specific projects that involve mining or modeling marketing or consumer choice data. For now, we hope the reader is clearer about the concepts of

• probabilities in a population
• probabilities in a sample
• expected counts in a sample
• observed counts in a sample

1. In our world, there are no properties (like length) that take describe things with values (like inches) that fall along a continuum.

## Why promotion strategies based on market basket analysis may not work

The title of Vindevogel et al’s paper is thought-provoking, but also somewhat misleading – their position is not that promotion strategies based on MBA do not (ie never) work, but that the dynamics of sales response to price change are complex and there are identifiable reasons why a particular promotion strategy based on a particular MBA will not/did not work.

Time series analysis may be adapted to MBA to help the market analyst understand these underlying dynamics – and thereby to adopt better promotion strategies.

This paper is a great example of the caution I’d raise about a lot of MBA – too much of the time the use of algorithms is uninformed by real-world knowledge (expressed as hypotheses or at least some a priori notion of how the data may behave). This lack of informed engagement on the part of the subject-matter expert in the analysis leads to various shortcomings with a lot of applied MBA:

1. data collection that’s inefficient
2. data analysis that amounts to what I call “magical thinking” – others have called it “data dredging” or “conducting a fishing expedition”
3. use of biased estimates of association effects
4. a flood of “spurious” associations between/among products that amount to Type I errors (ie “false positives”) in the standard theory of statistical inference (that are due to/to be expected from the vast number of effective tests of hypothetical associations)
5. most “legitimate” (ie statistically significant, when proper confidence values are used to assess multiple hypotheses) associations  are likely to be of no substantive or practical significance – and have little prospect of yielding a marketing strategy that will have any positive impact on sales

Some promising news: this paper suggests some ways of moving forward with MBA – other papers carry these suggestions further, particularly the necessity and possible means of evaluating the impact of your marketing strategies!

FYI – I’ve pushed some of the more technical details of the time-series analysis to the end of the post – the details are not needed to understand the thrust of the argument.

A near-to-next step is to identify a suite of R packages to carry out this analysis.

## Key notions:

• positive vs. negative cross-product price elasticities (aka cross-price elasticities)
• short-run vs. long-run (aka persistent) impacts of price promotion on sales
• nature of association between products
• complements
• substitutes
• [in another paper we’ll see distinction between complements-in-purchase vs. complements-in-use and ditto substitutes
• consumer-demand effect vs. competitive effect in response to price change
• using a well-defined, customized time-series analysis to determine dynamic (short-run vs. long-run impact) of sales response (none vs. positive vs. negative) to price change as a function of the nature of the association between products + consumer-demand effect + competitive effect

## Introduction

In a recent Journal of Marketing article, Shocker, Bayus, and Kim (2004) call for a better understanding of the connectedness among products. Indeed, it is clear and well known that the purchase of one product can influence purchases of other products. The underlying dynamics of these processes, however, remain less clear.

In the data-mining community, association-rule discovery is a popular technique to analyze the connectedness between sets of products. The framework of association rules was introduced by Agrawal, Imielinski, and Swami (1993) to efficiently mine a large collection of transactions for patterns of consumer purchase behavior. Since the first publications concerning association rules discussed the mining of retail databases, this technique is also referred to as market basket analysis.

The result of a market basket analysis is a set of combinations of products that are purchased together. Since the publication of the paper by Agrawal et al. (1993), literally hundreds of publications followed based on the proposed framework. However, as Hand, Mannila, and Smyth (2001) state: “It is fair to say that there are far more papers published on algorithms to discover association rules than there are papers published on applications of it”.

In scientific publications, there are only a few applications of market basket analysis in a retailing context. Examples of these rare applications include:

In the popular business press or text books, however, other, intuitively appealing applications are mentioned when discussing market basket analysis, including the use of market basket analysis to implement more effective price-promotion strategies. The underlying assumption is that associated products exhibit positive cross-price elasticities or, otherwise stated, that a price promotion has a positive impact on the sales of associated products. Market basket analysis is then used to select price promotions that will have a beneficial impact on the sales of full-margin associated products.

A reflection of this belief can be illustrated by the following citation of an article by two data-mining consultants in the popular business press: “Business managers or analysts can use a market basket analysis to plan couponing and discounting. It is probably not a good idea to offer simultaneous discounts on [two products] if they tend to be bought together. Instead, discount one to pull in sales of the other.” (Brand & Gerritsen, 1998).

This belief was also found in text books on data mining. Giudici (2003, p. 209) for example writes: “by promoting just one of the associated products, it should be possible to increase the sales of that product and get accompanying sales increases for the associated products.”

In this paper, we investigate whether the assumption of positive cross-price elasticities between associated products stands firm. We have reasons to believe that associated products do not necessarily show positive cross-price elasticities. Indeed, literature suggests that consumers tend to buy several products from the same category during a single shopping trip. This behavior is known as “horizontal variety seeking“, and can result in association rules between substitute products, which are expected to show negative cross-price elasticities.

Dubé (2004), for example, shows that 31% of shopping trips involving carbonated soft drinks result in the purchase of two or more different products. This behavior can result in association rules between sets of these products. However, as Dubé (2004) shows, these products can be classified as being substitutes, since price changes of a product result in switching behavior. Hence, although these products are associated, a price promotion will result in a negative impact on the sales of the associated product.

In this research, we use the transactional database of a retailer to mine for association rules. In a second phase, we derive  the effect of price promotions on the associated products analytically. This enables us to generalize empirically whether market basket analysis can be used effectively to implement a price-promotion strategy.

## Methodology

Figure 1 illustrates the methodological framework of this study. Our analysis proceeds in two phases:

1. mining a transactional database of a retailer for association rules
2. applying multivariate time-series techniques to measure the dynamic impact of a price promotion
1. estimating a vector autoregressive (VAR) model
2. deriving the impulse response function, which measures the dynamic impact of a sales promotion on the sales of the associated product
3. computing the cross-price elasticity from these responses
4. classifying the relationships as being substitutes, complements or independent, depending on the sign of these elasticities

Figure 1. Methodological framework.

Market basket analysis is a generic term for methodologies that study the composition of a basket of products purchased by a household during a single shopping trip. Agrawal et al. (1993) first introduced the association-rule framework to study market baskets. Originally, this framework consisted of two parameters: support and confidence. Silverstein, Brin and Motwani (1998) extended this framework by a third parameter: interest. More specifically, the three parameters are defined as follows:

Consider the association rule Y→Z, where Y and Z are two products. 1 Y represents the antecedent and Z is called the consequent.

Support of the rule: the percentage of all baskets that contain both products Y and Z

support = P(Y∧Z)

Confidence of the rule: the percentage of all the baskets containing Y that also contain Z. Hence, confidence is a conditional probability, i.e. P(Z|Y)

confidence = P(Y∧Z)/P(Y)

Interest of the rule: measures the statistical dependence of the rule, by relating the observed frequency of co-occurrence – P(Y∧Z) – to the expected frequency of co-occurrence under the assumption of conditional independence of Y and Z – P(Y)*P(Z)

interest = P(Y∧Z)/(P(Y) * P(Z))

Association-rule discovery is the process of finding strong product associations with a minimum support and/or confidence and an interest of at least one.

## Measuring the dynamic effect of a price promotion using multivariate time-series techniques

### A rationale for using multivariate time-series techniques

Recent research concerning the effects of price promotions is characterized by an increasing use of multivariate time-series techniques.

(Price promotion) ?⇒? (Short- and Long-run Impact on Sales) – Dekimpe, Hanssens, and Silva-Risso (1999)

• short-run effects of price promotion were found for most cases
• long-run effects were the exception

(Price Promotion) ?⇒? (Category Demand) – Nijs, Dekimpe, Steenkamp, and Hanssens (2001a)

• short-run impact in 58% of the cases, with a duration of 10 weeks on average
• long-run effects are exceptional – observed in 2% of the cases

(Price Promotion) ?⇒? (Category Incidence | Brand Choice | Purchase Quantity) – Pauwels, Hanssens, and Siddarth (2002)

• significant short-term effects for each of the sales components, with durations of up to 8 weeks
• in the long run, however, each sales component lacks a persistent promotion effect

(Temporary | Evolving | Structural Price Change) ?⇒? (Market Share) -Srinivasan, Leszczyc, and Bass (2000)

• temporary price changes and price promotions have only a short-run effect on market share
• structural changes or evolving prices have long-run effects on market share

Although long-term effects of price promotions/changes on sales/sales components are rare, there is still a rationale for the use of multivariate time-series techniques to study these sorts of effects:

1. the rare case of a persistent effect of a price promotion on sales may be of high strategic relevance – being able to measure and explain these rare occurrences should be of interest to the marketing community
2. time-series techniques are more flexible in measuring the short-run dynamics, which are observed in all studies. Indeed, time-series analysis is able to detect the most irregular fluctuations in the short-run promotional effects, whereas other techniques, like the Koyck model, necessitate an a priori specification of the response, which is usually a gradually decaying response
3. in measuring the cross-price elasticity, the use of multivariate time-series techniques shows another benefit – as Nijs, Dekimpe, Steenkamp, and Hanssens (2001b) argue, a cross-sales effect can have two sources:
1. a price promotion results in an increase in demand of the promoted product – causing changes in the demand of complement and substitute products = consumer-demand effect
2. a price promotion can cause marketing reactions of the associated product, which obviously also results in a change in demand of the associated product = competitive effect
4. an advantage of time-series analysis is that it simultaneously accounts for the two underlying sources of a cross-sales effect through the derivation of impulse-response functions

[See below for technical details of study design and analysis]

Figure 2 illustrates an impulse-response function with a short-run effect. It concerns the response of sales of toasted bread to a price promotion of instant soup. The significant responses are labeled with a dot.

Figure 2. Response of the sales of toasted bread to a price promotion of instant soup.

We observe a positive and significant immediate response of 1.31 (period 0). Weeks 1–3 are characterized by a small, be it insignificant, post-promotion dip followed by a period of purchase reinforcement in weeks 4–7. Week 7 is the first week that is followed by four non-significant responses, hence week 7 is the end of the dust-settling period. The short-run cross-price elasticity is derived by the summation of the responses during the dust-settling period, which yields a positive cross-price elasticity of 3.06. Since the responses converge to zero, there is no long-term cross-price effect.

Figure 3 illustrates an impulse-response function with a persistent, long-run effect. It concerns the response of vanilla-flavored ice-cream to a price promotion of chocolate flavored ice-cream. The graphical representation of the impulse-response function clearly shows that the responses converge to a persistent level of 0.97, which is the negative long-run cross-price elasticity. Here, the responses labeled with a dot indicate a response significantly different from the long-run effect. Week 10 is the first week that is followed by four non-significant responses, hence week 10 marks the end of the dust-settling period. The short-run cross-price elasticity is the sum of all the responses in the dust-settling period, which results in this case in a negative cross-price elasticity of 4.85.

Figure 3. Response of the sales of vanilla-flavored ice-cream to a price promotion of a chocolate-flavored ice-cream.

## Empirical results

The relationship between the promoted product and the associated product is classified as being independent, substitute or complementary depending on the direction of the cross-price elasticity.

First, we investigate the long-run cross-price elasticity. If this measure appears to be positive, we label the relationship as being complementary, whereas a negative persistent effect indicates a substitution relationship. In the absence of a long-run effect, we use the short-run estimates to classify the relationship. Again, a positive cross-sales effect is classified as being complementary, while a negative effect is classified as substitute. In the absence of a short-run effect, we classify the relationship as being independent.

Applying the aforementioned classification scheme, the 2700 relationships are classified in the following way:

• 1112 relationships are classified as being complements
• 1212 relationships are classified as being substitutes, and
• 376 relationships are classified as being independent.

In Table 1, we give examples of relationships that are classified as complements and substitutes. For both complements and substitutes, we list five relationships that were classified based on their long-run elasticity and five based on their short-run elasticity, resulting in 20 examples.

### Complements

In 60 instances, a price promotion has a long-run/ persistent positive effect on the sales level of the associated product.

 Persistent effects – complementary products Promoted product Reacting product Persistent response Fries Mayonnaise 0.73709 Low-fat yoghurt Kiwi 0.28199 Pork-cutlet Carrots 0.25441 Paprika Onion 0.20304 Bread Butter 0.01591 Short-run effects – complementary products Promoted product Reacting product Short-run response Duration Coffee Cream 5.58170 4 Shampoo Conditioner 3.56291 9 Instant soup Toasted bread 3.06367 7 Plastic plate Plastic cup 2.64980 15 Bread Smoked hams 0.99901ts 5 Persistent effects – substitution products Promoted product Reacting product Persistent response Tomato ketchup Curry ketchup -1.09531 Vanilla ice-cream Chocolate ice-cream -0.97388 Chicken soup Tomato soup -0.87933 Bottled water Coca-Cola -0.46227 Chocolate biscuit Spiced biscuit -0.37886 Short-run effects – substitution products Promoted product Reacting product Short-run response Duration Tzatziki Feta -1.38876 6 Tuna salad Salmon salad -1.06881 15 Cheese pie Cherry pie -0.69528 14 Orange juice Apple juice -0.61342 16 Rose-hip tea Yellow tea -0.22104 4

Table 1. Examples of cross-price elasticities between associated products.
Following our classification scheme, these instances are classified as complements. The mean value of this persistent cross-price elasticity is 0.89. The other 1052 complementary relationships are classified as being complements based on a positive short-run cross-price elasticity. The mean short-run elasticity is 4.56. These short-run dynamics take on average 13 weeks to stabilize.

### Substitutes

Forty-two cross-price elasticities exhibit a persistent negative sign, meaning that the price promotion has a persistent negative effect on the sales of the associated product. The mean value of the 42 elasticities is 0.62. The short-run dynamics have a mean value of 4.59. It takes on average 16 weeks for the short-run dynamics to stabilize.

Although we only considered product couples that can be labeled as product associations, it is remarkable that we can classify even more relationships as substitutes (1212) than as complements (1112). This finding denies the intuitively appealing business idea that association rules can be used by retailers to implement more effective promotion strategies. Indeed, the underlying hypotheses that product associations necessarily show positive cross-price elasticities do not seem to hold.

As we have shown, however, there is a bigger probability that the sales of associated products will drop. Indeed, observing that customers tend to buy two products on the same shopping occasion does not imply a complementary relationship between these two products.

Consumers can buy products together for a variety of reasons (see Manchanda, Ansari, & Gupta, 1999, or Böcker, 1978).

Especially, horizontal variety seeking behavior can result in association rules between two products which are actually substitutes. Horizontal variety seeking involves the simultaneous purchase for multiple varieties (Kim, Allenby, & Rossi, 2002). The adjective horizontal is added to make a clear distinction between variety seeking behavior, which describes the process of temporal changes in tastes from purchase occasion to purchase occasion (see for example McAlister & Pessemier, 1982).

Dubé (2004) lists three reasons for the occurrence of horizontal variety seeking

1. on a given shopping trip, consumers typically make purchases involving several consumption occasions. If preferences differ across these consumption occasions, this results in the simultaneous purchase of substitutes. A consumer can buy, for example, two flavors of tea, when he prefers flavor A in the morning, whereas he prefers flavor B in the evening.
2. the purchase of a variety of products from the same assortment may be a response to the consumer’s uncertainty about his future preferences
3. a consumer can make purchases for a complete household. This results in horizontal variety seeking, if preferences differ among the household members

## Conclusions

In this research, we have shown empirically that a market basket analysis is not a good technique to implement more efficient promotion strategies. The underlying assumption that associated products are by consequence complements with positive cross-price elasticities cannot be validated. This assumption ignores the occurrence of horizontal variety seeking, which results in the simultaneous purchase of substitutes. We measured the cross-price elasticities of 1350 associated products and conclude that a price promotion has a higher likelihood to result in a decrease of the associated product. For this reason, we advise retailers not to build a promotion expert system based on a market basket analysis. This system should be built on the derivation of cross-price elasticities. Especially, the use of multivariate time series techniques, which account for dynamic effects, can be used to implement more efficient promotion strategies that benefit from positive cross-sales effects.

## Technical details of the design and analysis

### Unit root tests

The first step of our analysis involves the testing for unit roots. Those tests are necessary, since variables that appear to be non-stationary have to be differenced before entering the model, whereas stationary variables enter the model in levels. Moreover, the presence of a unit root is a necessary condition for the existence of long-run effects (Dekimpe & Hanssens, 1995b).

Augmented Dickey-Fuller tests were used to test for the presence of a unit root. We used the testing scheme proposed by Enders (1995) (see Appendix A). The optimal lag length for the autoregressive part of the test was chosen using the Schwarz Bayesian Criterium (SBC).

This testing procedure classifies each series as a unit-root process, a stationary process or a trend-stationary process.

### Vector autoregressive (VAR) models

For each selected product couple, we estimate a four equation VAR model, with the prices and sales of both products as endogenous variables. We thereby control for factors that may influence sales, which are estimated as exogenous variables. More specifically, we estimate the effect of the featuring of the two products in the weekly folder of the retailer and the effect of the total sales per week of the retailer, which controls for external factors that may influence the sales of the two products. When one of the endogenous variables appears to be trend-stationary, a trend variable is included in all equations. 2 Hence, for every product association, the following system is estimated:

As mentioned earlier, endogenous variables that have a unit root enter the system in differences.

### Impulse response functions

The effects of a price promotion on the sales of the associated product are estimated by deriving impulse response functions from the estimated VAR systems.

Formally, price promotions are operationalized as onetime unit shocks of the price variable in the VAR model in levels. The impact of a price promotion of product A on the sales of product B, for example, is operationalized by setting the value εPa,t to -1 and measuring the over-time impact on ln(Sb) of this one-time unit shock. From each VAR model we derive two impulse-response functions, measuring cross-price elasticities. Firstly, we estimate the response of the sales of product B to a price promotion of product A, and, secondly, we estimate the response of the sales of product A to a price promotion of product B.

In a VAR system, the instantaneous effects cannot be estimated directly, but are reflected in the variance–covariance matrix of the residuals. We use the method proposed by Evans and Wells (1983) to derive the instantaneous effects (for an explanation of this method, and an argumentation to use this method in this research, we refer to Appendix B).

In order to derive confidence intervals for the estimated responses, we used a bootstrap method. Therefore, elements from the residuals are randomly drawn with replacement. Based on these residuals and the parameters estimated for the VAR model, we create new values for the four endogenous time series. We then re-estimate the parameters of the VAR model using these new time series, and impulse response functions are derived based on this model. This procedure is repeated 500 times. Finally, the sample standard error is computed for these 500 response values. Using this standard error, we compute the t-values of each response. Responses with an absolute t-value higher than 1 are labeled as significant.

We follow Nijs et al. (2001a) in deriving a short-term and a long-term effect from the estimated impulse-response function. A long-term or persistent effect occurs when the asymptotic value of the response (t/N) is significantly different from zero. Short-run effects are the summation of all the impulses over the dust-settling period. The dust-settling period ends at the first period which is followed by four non-significant responses. 3

## Description of the data

For our analysis, we used the transactional database of a large European retailer, which contains the sales transactions of six outlets between July 7th 1999 and March 26th 2003 of 15,017 different SKU’s.

First, we took a sample of all the transactions of 2002, and computed the support and interest measure for all possible combinations of two SKU’s. Since the database covers the sales of more than 15,000 different SKU’s, this results in the estimation of support and interest for more than 112,492,500 SKU pairs. We labeled a combination as a product association if it has an interest larger than two and a support exceeding 0.0157. 4 For the selected product associations we computed six variables on a weekly basis, which results in 194 weekly observations of the price of the two products (Pa and Pb), the sales in units of the two products (Sa and Sb), and the dummy variable that indicates whether the product featured in the folder for both products (Fa and Fb).

Since we are interested in the impact of price promotions on the sales of the associated product, we required that the price series of both products contain at least one price promotion in the 194 weeks. A price promotion was defined following the heuristic procedure in Abraham and Lodish (1993). They define a price promotion if the price is reduced by at least 5%, and then is raised again by at least 3% within the following 8 weeks. If there were weeks where the product was featured in the folder, these weeks do not count in the calculation of the 8 weeks period.

These restrictions result in 1350 selected product associations. As mentioned, we successively simulate a price promotion in both products and measure the impact of the promotion on the sales of the associated product, which results in the estimation of 2700 cross-price elasticities, both for the short and the long run.

## Appendix A

### Testing for unit roots using ADF-tests – Dolado, Jenkinson, and Sosvilla-Rivero (1990) and Enders (1995).

Testing for unit roots using ADF-tests.

### Appendix B- Deriving instantaneous effects – Evans and Wells (1983)

In a VAR system, the instantaneous effects cannot be estimated directly, but are reflected in the variance–covariance matrix of the residuals. For example, if we observe a high covariance between the errors of the sales of products A and B, we can infer that there is a high instantaneous effect between sales of products A and B. A problem, however, remains that we cannot directly observe the direction of these instantaneous effects. In our example, we do not directly know whether it is the sales of product A that have an immediate effect on the sales of product B, or whether it is the other way around, i.e. sales of product B that influence the sales of product A. This problem is traditionally solved by imposing restrictions on the instantaneous effects. These restrictions impose a priori a causal ordering of the instantaneous effects. In a system with n endogenous variables, we need (n2-n)/2 restrictions for identification, which resolves to six restrictions in our four equation model. Imposing these six restrictions in our particular setting seems to be problematic however. While we could reasonably assume that feedback effects from sales to price take some time to materialize, 5 and hence restrict the instantaneous effects from price to sales to zero, this only yields four restrictions (sales A does not have an instantaneous effect on price A and price B, and sales B does not have an instantaneous effect on price A and price B). Imposing further restrictions cannot be done on a theoretical basis. If we observe an instantaneous effect between the price of A and B, for example, there are no theoretical grounds to impose that price A has an instantaneous effect on price B, while price B can only affect price A after one week. To circumvent this problem, we use the method proposed by Evans and Wells (1983) (see Dekimpe & Hanssens, 1999 for an application in marketing) to estimate the instantaneous effects, since this method does not imply to impose restrictions.

This method models instantaneous effects as the expected value of the error term given a particular shock and by assuming a multivariate normal distribution of the error terms. Formally, the expected instantaneous effect of variable j as a result of a shock k of variable i is computed as

E(εjj = k) σijii

where σij is the corresponding element in the variance–covariance matrix.

Applying this method to our setting, a price promotion of product A is operationalized as a shock in the residual vector of

[-σPa,SaPa,Pa,-σPa,SbPa,Pa, -1, -σPa,PbPa,Pa]

## References

Abraham, M. M., Lodish, L. M. (1993). An implemented system for improving promotion productivity using store scanner data. Marketing Science, 12(3), 248–269.

Agrawal, R., Imielinski, T., Swami, A. (1993). Mining association rules between sets of items in large databases. Proceedings of the 1993 AC (pp. 207–216).

Anand, S. S., Patrick, A. R., Hughes, J. G., Bell, D. A. (1998). A data mining methodology for cross-sales. Knowledge-Based Systems, 10(7), 449–461.

Böcker, F. (1978). Die Bestimmung der Kaufverbundenheit von Produkten. Schriften zum Marketing , 7.

Brand, E., & Gerritsen, R. (1998). Associations and Sequencing. http:// www.dbmsmag.com/9807m03.html

Brijs, T., Swinnen, G., Vanhoof, K., Wets, G. (2004). Building an association rules framework to improve product assortment decisions. Data Mining and Knowledge Discovery, 8(1), 7–23.

Brin, S., Motwani, R., Silverstein, C. (1998). Beyond market baskets: Generalizing association rules to dependence rules. Data Mining and Knowledge Discovery, 2(1), 39–68.

Dekimpe, M., Hanssens, D. (1995a). The persistence of marketing effects on sales. Marketing Science, 14(1), 1–21.

Dekimpe, M., Hanssens, D. (1995b). Empirical generalizations about market evolutions and stationarity. Marketing Science, 14(3), G106– G121.

Dekimpe, M., Hanssens, D. (1999). Sustained spending and persistent response: A new look at long-term marketing profitability. Journal of Marketing Research, 36(4), 397–412.

Dekimpe, M., Hanssens, D., Silva-Risso, J. (1999). Long-run effects of price promotions in scanner markets. Journal of Econometrics, 89, 269– 291.

Dolado, J., Jenkinson, T., Sosvilla-Rivero, S. (1990). Cointegration and unit roots. Journal of Economic Surveys, 4, 249–273.

Dubé, J.-P. (2004). Multiple discreteness and product differentiation: Demand for carbonated soft drinks. Marketing Science, 23(1), 66–81.

Enders, W. (1995). Applied econometric time series. New York: Wiley.

Evans, L., Wells, G. (1983). An alternative approach to simulating VAR models. Economic Letters, 12(1), 23–29.

Giudici, P. (2003). Applied data mining. West Sussex, England: Wiley.

Hand, D., Mannila, H., Smyth, P. (2001). Principles of data mining. Massachusetts Institute of Technology.

Kim, J., Allenby, G. M., Rossi, P. E. (2002). Modeling consumer demand for variety. Marketing Science, 21(3), 229–250.

Manchanda, P., Ansari, A., Gupta, S. (1999). The “shopping basket”: A model for multicategory purchase incidence decisions. Marketing Science, 18(2), 95–114.

McAlister, L., Pessemier, E. (1982). Variety seeking behaviour: An interdisciplinary review. Journal of Consumer Research, 9(3), 311–322. Nijs, V. R., Dekimpe, M. G., Steenkamp, J.-B.E.M., Hanssens, D. M. (2001a). The category-demand effects of price promotions. Marketing Science, 20(1), 1–22.

Nijs, V. R., Dekimpe, M. G., Steenkamp, J.-B.E.M., & Hanssens, D. M. (2001b). Tracing the impact of price promotions across categories, Working Paper, Kellogg School of Management.

Pauwels, K., Hanssens, D., Siddarth, S. (2002). The long-term effects of price promotions on category incidence, brand choice, and purchase quantity. Journal of Marketing Research, 39(4), 421–439.

Shocker, A., Bayus, B., Kim, N. (2004). Product complements and substitutes in the real world: The relevance of “other products”. Journal of Marketing, 68(1), 28–40.

Srinivasan, S., Leszczyc, P., Bass, F. (2000). Market share response and competitive interaction: The impact of temporary, evolving and structural changes in prices. International Journal of Research in Marketing, 17(4), 281–305.

Van den Poel, D., De Schamphelaere, J., Wets, G. (2004). Direct and indirect effects of retail promotions. Expert Systems with Applications, 27(1), 53–62.

Wang, F.-S., Shao, H.-M. (2004). Effective personalized recommendation based on time-framed navigation clustering and association mining. Expert Systems with Applications, 27(3), 365–377.

1. More general, Y and Z can be sets of products instead of single products.
2.  Including a trend variable in all equations allows us to estimate the system using OLS. Only including a trend variable in the equations which are trend-stationary would oblige us to estimate the system with SUR, which results in a heavy computational load given the number of systems to be estimated. See Nijs et al., 2001a for a similar approach.
3.  When there is no persistent effect, significant means significantly different from zero. When there is a persistent effect, significant means significantly different from the persistent effect.
4.  The threshold value of 0.0157 results from the fact that we imposed that the two products should have been sold at least 1000 times together in the observation period. Since there were 6,368,614 baskets in total, this is the same as demanding a minimum support of 0.0157.
5.  This assumption gives marketing mix variables causal priority over sales variables. For an application of this assumption see Dekimpe and Hanssens (1995a).

The simplest association analysis is often referred to as market basket analysis. Within Rattle this is enabled when the Baskets button is checked. In this case, the data is thought of as representing shopping baskets (or any other type of collection of items, such as a basket of medical tests, a basket of medicines prescribed to a patient, a basket of stocks held by an investor, and so on). Each basket has a unique identifier, and the variable specified as an Ident variable in the Data tab is taken as the identifier of a shopping basket. The contents of the basket are then the items contained in the column of data identified as the target variable. For market basket analysis, these are the only two variables used.

To illustrate market basket analysis with Rattle, we will use a very simple dataset consisting of the DVD movies purchased by customers. Suppose the data is stored in the filedvdtrans.csv and consists of the following:

 ID,Item 1,Sixth Sense 1,LOTR1 1,Harry Potter1 1,Green Mile 1,LOTR2 2,Gladiator 2,Patriot 2,Braveheart 3,LOTR1 3,LOTR2 4,Gladiator 4,Patriot 4,Sixth Sense 5,Gladiator 5,Patriot 5,Sixth Sense 6,Gladiator 6,Patriot 6,Sixth Sense 7,Harry Potter1 7,Harry Potter2 8,Gladiator 8,Patriot 9,Gladiator 9,Patriot 9,Sixth Sense 10,Sixth Sense 10,LOTR 10,Galdiator 10,Green Mile 

We load this data into Rattle and choose the appropriate variable roles. In this case it is quite simple:

On the Associate tab (of the Unsupervised paradigm) ensure the Baskets check box is checked. Click the Execute button to identify the associations:

Here we see a summary of the associations found. There were 38 association rules that met the criteria of having a minimum support of 0.1 and a minimum confidence of 0.1. Of these, 9 were of length 1 (i.e., a single item that has occurred frequently enough in the data), 20 were of length 2 and another 9 of length 3. Across the rules the support ranges from 0.11 up to 0.56. Confidence ranges from 0.11 up to 1.0, and lift from 0.9 up to 9.0.

The lower part of the same textview contains information about the running of the algorithm:

We can see the variable settings used, noting that Rattle only provides access to a smaller set of settings (support and confidence). The output includes timing information fore the various phases of the algorithm. For such a small dataset, the times are of course essentially 0!