data direct; input x y; cards; 1 17.5 3 20.5 ;Here is an example of data input from another file:

data pulse; infile '/p/stat/Data/MJ/pulse.dat' missover; input x y;The names

data a; create new data set named "a" input x y z; input 3 numbers at a time as variables x,y,z input trt $ x y; input treatment "trt" as a character string and x,y as numbers. Note the dollar sign ($). infile 'blah.dat' missover; use file "blah.dat" for the data "missover" skips over missing data rather than going to a new line (must appear BEFORE the input phrase) infile 'blah.dat' firstobs=2; skip first observation (first line) handy way to document column names infile 'blah.dat' lrecl=2000; allow for really long records readlines; same as cards (I think!) cards; read data from following lines (must appear AFTER the input phrase) ; end of data entry for "cards" phrase (good convention, but not required)Data values must have spaces between them (tabs can cause problems on some systems). All values must be on the same line if using the

data logs; set direct; logy = log(y);This creates a new data set

data a; set b; create data set "a" using existing set "b" z = log(y); create variable z as natural log of variable y z = log10(y); log base 10 z = sqrt(y); square root z = x*y; multiplication (+ addition) (- subtraction) (/ division) z = y**2; exponent: "y squared" or "y to the 2nd power" z = y**0.5; "y to the 1/2 power" (same as sqrt(y)) z = x**-2; negative exponent: "1 over (x squared)" z = sin(x); trigonometric sine function of x (also cos(x), tan(x), ...)

data a; set b; z = sqrt(count); /* counts (Poisson distribution) */ /* variance proportional to mean */ z = log(conc); /* concentrations, weights (log normal) */ /* SD proportional to mean */ /* constant coefficient of variation (CV) */ z = arsin(sqrt(prop)); /* proportions (0-1) */ z = arsin(sqrt(pct/100)); /* percentages (0-100) */ /* (Binomial distribution) */ /* variance proportional highest in middle */

data other; set big; /* create other from big */ if x > 10; /* only use these cases */Suppose you had data set

data trtonly; set field; /* create trtonly from field */ if trt = 'control' then delete; /* delete control group */Here is some more detail on the

g = 0; /* g=0 for large x */ if x < 10 then g = 1; /* g=1 for small x */ if y = 99 then y = .; /* recode 99 as missing data */ if y = . then y = 0; /* recode missing data as 0 */ if z < 10 or y > 10 then x = 5; /* examples of union (or) */ if z < 10 and y > 10 then x = 6; /* and intersection (and) */ if x <= 10; /* keep only x at most 10 */ if x >= 10; /* keep only x at least 10 */ if not (x = 10); /* keep only if x is not 10 */You already saw how to add variables in transformations above. You can drop variables:

data a; set b; z = log(y); /* create new variable z */ drop y; /* drop old variable y */Usually dropping is NOT done because the cost of carrying the unused variables is very small (unless you have a lot of data!). However, this is sometimes useful if the data need to be presented in a different way. For instance,

data abc; input n0 n1 n2 n3 n4 n5; cards; 1.4 1.5 1.2 2.1 2.1 2.8 1.7 1.4 1.0 1.4 1.7 2.1 1.1 1.9 2.5 2.6 2.1 2.2 1.7 1.3 1.1 1.0 2.0 1.8 1.0 1.8 1.5 1.4 2.2 2.3 data resps; set abc; resp = n0; level = 0; output; resp = n1; level = 1; output; resp = n2; level = 2; output; resp = n3; level = 3; output; resp = n4; level = 4; output; resp = n5; level = 5; output; drop n0--n5;Basically, the

data a; do i=1 to 10; uni=ranuni(0); /* an argument of 0 uses the clock as a seed */ /* otherwise, use a 5 to 7 digit odd number */ output; end;Note the use of a

x = ranuni(seed) /* uniform between 0 & 1 */ x = a+(b-a)*ranuni(seed); /* uniform between a & b */ x = ranbin(seed,n,p); /* binomial size n prob p */ x = rancau(seed); /* cauchy with loc 0 & scale 1 */ x = a+b*rancau(seed); /* cauchy with loc a & scale b */ x = ranexp(seed); /* exponential with scale 1 */ x = ranexp(seed) / a; /* exponential with scale a */ x = a-b*log(ranexp(seed)); /* extreme value loc a & scale b */ x = rangam(seed,a); /* gamma with shape a */ x = b*rangam(seed,a); /* gamma with shape a & scale b */ x = 2*rangam(seed,a); /* chi-square with d.f. = 2*a */ x = rannor(seed); /* normal with mean 0 & SD 1 */ x = a+b*rannor(seed); /* normal with mean a & SD b */ x = ranpoi(seed,a); /* poisson with mean a */ x = rantri(seed,a); /* triangular with peak at a */ x = rantbl(seed,p1,p2,p3); /* random from (1,2,3) with probs */ /* p1,p2,p3 */The

data uniform; do i = 1 to 20; x = ranuni(0); output; end; data a; merge b uniform; proc sort; by x; data c; set a; /* _N_ = line number */ trt = ceil(_N_ / 5); /* ceil = next highest integer */ proc sort; by id; proc print; var id trtHere is a randomized comblete design, with 3 blocks and 4 treatments per block. We assign the treatments 1,2,3,4 at random to the 4 sites within a block.

data a; do block = 1 to 3; do site = 1 to 4; x = ranuni(0); output; end; end; proc sort; by block x; data c; set a; trt = 1 + mod(_N_ - 1, 4); /* mod = remainder of _N_/4 */ proc sort; by block site; proc print; var block site trt;

Return to U WI Statistics Home Page

Last modified: Tue Feb 6 14:12:35 1996 by Brian Yandell
Tue Feb 14 11:09:50 1995 by Stat Www
*(statwww@stat.wisc.edu)*