SDA 3.5 Documentation for CORRTAB
NAME
corrtab - crosstabular breakdown of correlations
USAGE
corrtab -b batchfile
DESCRIPTION
CORRTAB displays the correlations between two variables (the X-
variable and the Y-variable) in a crosstabular format.
Ordinarily this program is invoked by the Web interface for the
SDA programs, and the user does not have to deal with the
keywords given in this document. Output from the program is in
HTML, which can be viewed with a Web browser.
It is also possible to run the program directly by preparing a
command file, which specifies the variables to be analyzed and
the options to use. This document explains how to prepare such a
file. The name of this batch command file is specified to the
program after the ‘-b’ option flag.
KEYWORDS
The batch file contains specifications for the analysis. These
specifications are given in the form "keyword = something" with
one keyword per line. Keywords may be given in any order, either
in upper or in lower case. The valid keywords are as follows
(with significant characters shown in capital letters):
Keyword Possible Specification Default (if no keyword)
_____________________________________________________________________
STUdy= path of dataset directory Look for variables in
current directory only
XVAR= name(s) of 1st variable REQUIRED
(separated by spaces/commas)
YVAR= name(s) of 2st variable REQUIRED
ROWvar= variable name(s) REQUIRED
(separated by spaces/commas)
COLUMNvar= variable name(s) No column variable
CONtrolvar= variable name(s) No control variable
Weight= name of weight variable No weighting
Filter= name(s) and codes of filter No filter
variable(s)
COLORcoding= Yes No color coding of
coefficients or headings
GVARCase= LOWER or UPPER No force to lower/upper case
LAnguagefile= Name of file with non-English English labels on
labels and messages output
RUNtitle= Title or comments for run No title or comments
(1 line only)
SAvefile= filename to receive output Output sent to screen
(overwrite existing file) (standard output)
TExt= Yes No text for variables
Statistic to display
The main statistic to display in each cell of the table can be
one of two options: the Pearson correlation coefficient, or the
log
of the odds ratio. The default main statistics to display are
the
Pearson correlation coefficients.
Instead of displaying the main statistic directly, it is
possible
to display the DIFFERENCE from something else, by adding the
‘difference=’ keyword.
For each statistic the user can specify the number of desired
decimal places (in parentheses, after the name of the
statistic).
See below for the default number of decimals for each statistic.
Keyword Possible Specification Default (if no keyword)
_____________________________________________________________________
MAINstat= CORR (ndec) Display correlations,
LOGodds (ndec) with default number
of decimal places
DIFference= Overall (ndec) Display main statistic
(diff from overall correlation)
Optional statistics
In addition to the main statistic, one or more of the following
optional statistics can be displayed in each cell (with the
desired
number of decimal places in parentheses if the defaults, listed
below,
are not satisfactory. Note that the ‘statisitics=’ keyword can
be
repeated on subsequent lines if necessary.
Keyword Possible Specification Default (if no keyword)
_____________________________________________________________________
STAtistics=
SE (ndec) No standard errors
TSTATistic (ndec) No t-statistic in cells
Ncases No unweighted N’s
WNcases (ndec) No weighted N’s
DICHOTOMIZING VARIABLES
The calculation of the odds ratio assumes that the two variables
have only two categories each. If these statistics are
requested, CORRTAB treats the X-variable and the Y-variable as
dichotomies, regardless of the number of categories they may
actually have. The minimum valid value of each variable is
treated as the base category (coded 0), and all valid values
greater than the minimum are combined into the other category
(coded 1). If this default dichotomization is not appropriate
for a particular variable, you can specify another temporary
recode after the variable name is given.
CALCULATION OF STANDARD ERRORS
If standard errors are requested, they are computed with the
standard formulas for each statistic or its transformation. Note
that the confidence interval for the Pearson correlation
coefficient is not symmetric; therefore, there is no single
standard error that applies in both directions. CORRTAB outputs
the average distance of the upward and the downward confidence
band for one standard error (based on the retransformation of
Fisher’s Z), since that number is ordinarily a useful
approximation. However, if cell sizes are small or the
correlations of interest are close to zero or one, this average
may not be good enough to make statistical inferences. In such a
case (or when in doubt) use Fisher’s transformation and its
associated standard error to carry out statistical tests on the
corresponding Pearson correlations.
Note that the calculation of the standard error of the
correlation coefficient in each cell is based by default on the
UNWEIGHTED number of cases, even if a weight variable has been
used for calculating the correlation coefficient. Ordinarily
this procedure will generate a more appropriate statistical test
than one based on the weighted N in each cell.
ABBREVIATIONS
Keywords can usually be abbreviated down to the number of
characters required to differentiate them from other keywords.
Sometimes only one character is required. The keyword for the
weight variable, for instance, can be given as "weight=" or
"wei=" or even "w=". Either upper or lower case may be used. In
the list of keywords above, the minimum string of characters
required for each specification is shown in capital letters.
COMMENTS
Anything on a line beginning with "#" is ignored by the batch
processor and can therefore be used for comments. Blank lines
are also ignored.
DECIMAL PLACES
Each statistic has a default number of decimal places with which
it will be printed. To change the default, put the desired
number of decimals in parentheses after specifying the statistic.
The default number of decimal places are as follows:
- main statistics: 2 (correlations, logs of odds ratios, and
their differences)
- se: 3
- Tstatistic: 2
- wncases: 0
It is not necessary to request the ‘correlation’ main
statistic unless you want to change the number of decimal places.
Unless otherwise specified, the Pearson correlation coefficient
is the statistic that will be displayed.
MENTION OF KEYWORD SUFFICIENT
The form ‘keyword=yes’ may be shortened to ‘keyword’. That is,
the ‘=yes’ may be omitted for those options which require no
further specification. For example, ‘text=yes’ can be shortened
to ‘text’.
ORDER OF PROCESSING LISTS
When more than one variable is given for the x, y, row, column,
or control variable specifications, the tables are produced in
the following order: Tables for EACH of the control variables
are produced with the FIRST column variable and the FIRST row
variable and the FIRST pair of x and y variables. Then the whole
list of control variables is processed again for the SECOND
column variable and the FIRST row variable and the FIRST pair of
x and y variables; and so on until the whole set of column
variables has been processed. Then the whole series is repeated
for the SECOND row variable; and so on until all the row
variables have been used. Then the whole series is repeated for
the SECOND Y-variable; and so on until all the Y-variables have
been used. Finally, the whole series is repeated for each
succeeding X-variable.
Briefly, the variables will cycle in the following order:
control, column, row, Yvar, Xvar. All of the tables will be
produced using the same weight, filters, and other options.
REPETITION OF KEYWORDS
If there is not enough room on a line to list all of the desired
variables, the keyword can be repeated on a new line, and more
variables can be listed. In such a case the second list is
appended to the first list, for purposes of generating tables.
This appending feature applies to the keywords for specifying the
x and y variables, the row, column, control, and filter
variables, and the ‘statistics=’ keyword. If other keywords are
repeated, the program will print an error message and stop.
EXAMPLES OF BATCH FILES
Basic example
study = /sa/nes84
xvar = spend
yvar = spend2
row = education
column = gender
savefile = mytables
Using more options
Specify multiple sets of variables, redefine some ranges,
and use weight and filter variables.
xvar = spend spend2 spend3
yvar = age educ
row = var1(1-9) var2 var3(0-9)
column = var3, var4
weight= wtvar
filters= var21(1-3) var30(1)
savefile = mytables
Differences and other options
Put differences instead of original correlations in each cell,
and request some text options
xvar = spend
yvar = spend2
row = var1 var2
column = var4 var5
# Display differences (with 3 decimal places) from the overall
correlation coefficient
differences = overall(3)
# Request that full text of the variables be printed,
# and put a run title or comment on the top of each page
text= yes
runtitle= Test run to demonstrate program
savefile= mytables
CSM, UC Berkeley
April 12, 2011