Saturday, 19 September 2015

SAS Assignment #Praxis

Data Sets created are:
-Hosp
-data B29.hosp;
   do j = 1 to 1000;
      AdmitDate = int(ranuni(1234)*1200 + 15500);
      quarter = intck('qtr','01jan2002'd,AdmitDate);
      do i = 1 to quarter;
         if ranuni(0) lt .1 and weekday(AdmitDate) eq 1 then
            AdmitDate = AdmitDate + 1;
         if ranuni(0) lt .1 and weekday(AdmitDate) eq 7 then
            AdmitDate = AdmitDate - int(3*ranuni(0) + 1);
         DOB = int(25000*Ranuni(0) + '01jan1920'd);
         DischrDate = AdmitDate + abs(10*rannor(0) + 1);
         Subject + 1;
         output;
      end;
   end;
   drop i j;
   format AdmitDate DOB DischrDate mmddyy10.;
run;

- Sales data set
data B29.sales;
   input    EmpID     :       $4.
            Name      &      $15.
            Region    :       $5.
            Customer  &      $18.
            Date      : mmddyy10.
            Item      :       $8.
            Quantity  :        5.
            UnitCost  :  dollar9.;
   TotalSales = Quantity * UnitCost;
/*   format date mmddyy10. UnitCost TotalSales dollar9.;*/
   drop Date;
datalines;
1843 George Smith  North Barco Corporation  10/10/2006 144L 50 $8.99
1843 George Smith  South Cost Cutter's  10/11/2006 122 100 $5.99
1843 George Smith  North Minimart Inc.  10/11/2006 188S 3 $5,199
1843 George Smith  North Barco Corporation  10/15/2006 908X 1 $5,129
1843 George Smith  South Ely Corp.  10/15/2006 122L 10 $29.95
0177 Glenda Johnson  East Food Unlimited  9/1/2006 188X 100 $6.99
0177 Glenda Johnson  East Shop and Drop  9/2/2006 144L 100 $8.99
1843 George Smith  South Cost Cutter's  10/18/2006 855W 1 $9,109
9888 Sharon Lu  West Cost Cutter's  11/14/2006 122 50 $5.99
9888 Sharon Lu  West Pet's are Us  11/15/2006 100W 1000 $1.99
0017 Jason Nguyen  East Roger's Spirits  11/15/2006 122L 500 $39.99
0017 Jason Nguyen  South Spirited Spirits  12/22/2006 407XX 100 $19.95
0177 Glenda Johnson  North Minimart Inc.  12/21/2006 777 5 $10.500
0177 Glenda Johnson  East Barco Corporation  12/20/2006 733 2 $10,000
1843 George Smith  North Minimart Inc.  11/19/2006 188S 3 $5,199

;

- Blood data set
data B29.blood;
   infile datalines truncover;
   length Gender $ 6 BloodType $ 2 AgeGroup $ 5;
   input Subject 
         Gender 
         BloodType 
         AgeGroup
         WBC 
         RBC 
         Chol;
   label Gender = "Gender"
         BloodType = "Blood Type"
         AgeGroup = "Age Group"
         Chol = "Cholesterol";
datalines;
1    Female AB Young 7710   7.4  258
2    Male   AB Old   6560   4.7  .
3    Male   A  Young 5690   7.53 184
4    Male   B  Old   6680   6.85 .
5    Male   A  Young .      7.72 187
6    Male   A  Old   6140   3.69 142
7    Female A  Young 6550   4.78 290
8    Male   O  Old   5200   4.96 151
9    Male   O  Young .      5.66 311
10   Female O  Young 7710   5.55 .
11   Male   B  Young .      5.62 152
12   Female O  Young 7410   5.85 241
13   Male   O  Young 5780   4.37 .
14   Female O  Old   5590   6.94 152
15   Female A  Old   6520   6.03 217
16   Female O  Young 7210   5.17 193
17   Male   A  Old   .      5.63 .
18   Male   O  Old   6410   6.02 224
19   Female A  Old   6360   3.74 211
20   Male   A  Young 7580   5.13 179
21   Female A  Old   7150   6.35 200
22   Female A  Young 8710   5.12 211
23   Female O  Young 7660   4.91 .
24   Female B  Young 8280   6.14 .
25   Female AB Old   7480   4.7  183
;

- Missing data set
data B29.missing;
   input A $ B $ C $;
datalines;
X Y Z
X Y Y
Z Z Z
X X .
Y Z .
X . .

;

- data set for Inventory and New Product

data B29.inventory;
   input Model $ Price;
   format Price dollar8.2;
datalines;
M567 23.50
S888 12.99
L776 159.98
X999 29.95
M123 4.59
S776 1.99
;
data B29.newproducts;
   input Model $ Price;
   format Price dollar8.2;
datalines;
L939 10.99
M135 .75
;

- data set for purchase
data B29.purchase;
   input CustNumber Model $ Quantity;
datalines;
101 L776 1
102 M123 10
103 X999 2
103 M567 1
;

- data set for health
data B29.health;
   input Subj : $3.
         Height
         Weight;
datalines;
001 68 155
003 74 250
004 63 110
005 60 95
;

- data set for Psych
data B29.psych;
   input ID : $3. Ques1-Ques10 Score1-Score5;
datalines;
001 1 3 2 4 5 4 3 4 5 4 90 92 93 90 88
002 3 3 . . 3 4 5 5 1 . 95 . . 86 85
003 . . . . 5 5 4 4 3 3 88 87 86 85 84
004 5 3 4 5 . 5 4 3 3 . 78 78 82 84 .
005 5 4 3 2 1 1 2 3 4 5 92 93 94 95 99

;
- data set for nines
data B29.nines;
   infile datalines missover;
   input x y z (Char1-Char3)(:$1.) a1-a5;
datalines;
1 2 3 a b c 99 88 77 66 55
2 999 999 d c e 999 7 999
10 20 999 b b b 999 999 999 33 44
;

Performing Conditional Processing

Problem 1 :
 Using the SAS data set Hosp, use PROC PRINT to list observations for Subject
values of 5, 100, 150, and 200. Do this twice, once using OR operators and once
using the IN operator.

Note: Subject is a numeric variable

Solution:

Result:


Learning: 
  • Here we can see that two result but both are the same. Only difference is that in first result there is observation column and in second is not.
  • In second one, we have used noobs in proc print statement that result no observation.
  • In coding, for the first one, you can see that the result obtain through OR operator and second by IN operator.


Problem 2 :
 Using the Sales data set, list the observations for employee numbers (EmpID) 9888
and 0177. Do this two ways, one using OR operators and the other using the IN
operator.
Note: EmpID is a character variable.

Solution:
Result:
Learning:
  •  In this problem we have used Where statement to subset data, when your input data is indexed, the Where statement might be more efficient.
  • You may also use a Where statement in  a SAS procedure to subset the data being processed.

Problem 3 :
 Starting with the Blood data set, create a new, temporary SAS data set containing
all the variables in Blood plus a new variable called CholGroup. Define this new
variable as follows:
􀂃 CholGroup Chol
􀂃 Low Low – 110
􀂃 Medium 111 – 140
􀂃 High 141 – High
Use a SELECT statement to do this.

Solution:

Result:

Learning:
  • Here we have used Select statement that is alternative of If and Else If statements. Select Chol group, if the statement is true, the statement following the when expression is executed and control skip to the end of the select group.

Performing Iterative Processing: Looping

Problem 4:
   
 Run the program here to create a temporary SAS data set called Vitals:
data vitals;
input ID : #3.
Age
Pulse
SBP
DBP;
label SBP = "Systolic Blood Pressure"
DBP = "Diastolic Blood Pressure";
datalines;
001 23 68 120 80
002 55 72 188 96
003 78 82 200 100
004 18 58 110 70
005 43 52 120 82
006 37 74 150 98
007 . 82 140 100
;

Solution:
Result:
Learning:
  • In this problem, we have learnt how to use label statement. Label statement basically use for column heading  this listing here is more readable but here label statement is not working.
  • For run the program through label you have to give the label statement in proc step.


Using this data set, create a new data set (NewVitals) with the following new
variables:
For subjects less than 50 years of age:
If Pulse is less than 70, set PulseGroup equal to Low;
otherwise, set PulseGroup equal to High.
If SBP is less than 130, set SBPGroup equal to Low;
otherwise, set SBPGroup equal to High.
For subjects greater than or equal to 50 years of age:
If Pulse is less than 74, set PulseGroup equal to Low;
otherwise, set PulseGroup equal to High.
If SBP is less than 140, set SBPGroup equal to Low;
otherwise, set SBPGroup equal to High.
You may assume there are no missing values for Pulse or SBP.

Solution:

Result:
Learning:
  •  In this problem, we have learnt how to use if else statement because question is itself self explanatory only we have to follow that step.
  • If statement is true then execute IF statement otherwise go to the Else statement like if Pulse less than (lt) 70 then it groups in 'Low ' pulse group otherwise 'High' pulse group.
Problem 5:

Run the program here to create a temporary SAS data set (MonthSales):

data monthsales;
input month sales @@;
/* add your line(s) here */
datalines;
1 4000 2 5000 3 . 4 5500 5 5000 6 6000 7 6500 8 4500
9 5100 10 5700 11 6500 12 7500
;
Modify this program so that a new variable, SumSales, representing Sales to date, is
added to the data set. Be sure that the missing value for Sales in month 3 does not
result in a missing value for SumSales.

Solution:


Result:
Learning: 
  • In this sum, we have created a new variable that is Sum Sales that consist cumulative sales.
  • Here we have also used retain statement for Sum Sales in starting the Sum Sales should be zero. for missing value we have given the not missing statement. if we didn't give the not missing statement then it prints the Sum Sales Value till observation 2 after that it prints only missing value in Sum Sales column.
Problem 6:

Count the number of missing values for the variables A, B, and C in the Missing data
set. Add the cumulative number of missing values to each observation (use variable
names MissA, MissB, and MissC). Use the MISSING function to test for the missing
values.

Solution:

Result:


Learning:
  • Here we have counted the total number of missing value for each individual column with the help of if (missing) statement 
  • As you can see in result that column MissA has total zero missing value while MissB amd MissC has one and three respectively.
Subsetting and Combining SAS Data Sets

Problem 7:

Using the SAS data set Blood, create two temporary SAS data sets called Subset_A
and Subset_B. Include in both of these data sets a variable called Combined equal to
.001 times WBC plus RBC. Subset_A should consist of observations from Blood
where Gender is equal to Female and BloodType is equal to AB. Subset_B should
consist of all observations from Blood where Gender is equal to Female, BloodType
is equal to AB, and Combined is greater than or equal to 14.

Solution:



Result:



Learning:
  • Here, there are two data set one is subset_a and and another is subset_b .
  • In data set subset_a, we have used Where condition with And statement while in subset_b, If with and statement but both doing the same thing first subsetting then combining .
  • This is most widely used in subsetting and combining statement.

Problem 8:

Print out the observations in the two data sets Inventory and NewProducts. Next,
create a new temporary SAS data set (Updated) containing all the observations in
Inventory followed by all the observations in NewProducts. Sort the resulting data
set and print out the observations.

Solution:

Result:

Learning:
  • In this problem we have used proc sort data that followed the by statement like here proc sort data =updated by model.
  • In result you can see that the column start with model column that is the beauty of proc sort data
  • this proc sort data is too large then it is very helpful, after sorting we can easily do other operation from the table

Problem 9: (Merging two data sets)

Merge the Purchase and Inventory data sets to create a new, temporary SAS data set
(Pur_Price) where the Price value found in the Inventory data set is added to each
observation in the Purchase data set, based on the Model number (Model). There are
some models in the Inventory data set that were not purchased (and, therefore, are
not in the Purchase data set). Do not include these models in your new data set.
Based on the variable Quantity in the Purchase data set, compute a total cost
(TotalCost) equal to Quantity times Price in this data set as well.

Solution:


Result:

Learning:
  • Sas use the term merge to describe the process of combining variables (columns) from two or more data sets like same thing that happened in this problem.
  • First, we have sort the both data set by the Model variable that means both table starting variable is Model .
  • Now we have merge both data set B29.inventory and B29.purchase in one data set that is B29.pur_price.
Working with Numeric Functions

Problem 10:

Using the SAS data set Health, compute the body mass index (BMI) defined as the
weight in kilograms divided by the height (in meters) squared. Create four other
variables based on BMI: 1) BMIRound is the BMI rounded to the nearest integer, 2)
BMITenth is the BMI rounded to the nearest tenth, 3) BMIGroup is the BMI rounded
to the nearest 5, and 4) BMITrunc is the BMI with a fractional amount truncated.
Conversion factors you will need are: 1 Kg equals 2.2 Lbs and 1 inch = .0254 meters.

Solution:


Result:

Learning:
  • Two of the most useful function are the ROUND and INT. as the name suggest round is used to round numbers.
  • The INT function returns the integer portion of a numeric value like in BMIGroup number is round by thats why in that column every number is multiple of 5.
  • Typical values for round-off units are .1, .01, .001 and so on.
Problem 11:

The SAS data set Psych contains an ID variable, 10 question responses (Ques1–
Ques10), and 5 scores (Score1–Score5). You want to create a new, temporary SAS
data set (Evaluate) containing the following:
a. A value (ScoreAve) consisting of the mean of the three highest Score values. If
there are fewer than three non-missing score values, ScoreAve should be
missing.
b. An average of Ques1–Ques10 (call it QuesAve) if there are seven or more nonmissing
values.
c. A composite score (Composit) equal to ScoreAve plus 10 times QuesAve.

Solution:

Result:
Learning:
  • In this problem we have learnt, how and why use Largest function.
  • Like in this sum, we have used (mean(largest(2,of  Score1-Score5))) that returns the mean of two largest value from the Score1-Score5.
  • We can also use max,min,sum,etc in the place of mean.

Working with Character Functions

Problem 12: 

  Look at the following program and determine the storage length of each of the
variables:
data storage;
length A $ 4 B $ 4;
Name = 'Goldstein';
AandB = A || B;
Cat = cats(A,B);
if Name = 'Smith' then Match = 'No';
else Match = 'Yes';
Substring = substr(Name,5,2);
run;
A _________________
B _________________
Name _________________
AandB _________________
Cat _________________
Match _________________
Substring _________________

Solution:


Result:


Learning:
  • Here we have used length function that returns the length of a character value but it not counts trailing blanks.
  • Lengthc returns the storage length of a string for each individual variable.

Problem 13:


Here is a listing of data set Names_And_More:
Listing of Data Set LEARN.NAMES_AND_MORE
Name Phone Height Mixed
Roger Cody (908)782-1234 5ft. 10in. 50 1/8
Thomas Jefferson (315) 848-8484 6ft. 1in. 23 1/2
Marco Polo (800)123-4567 5Ft. 6in. 40
Brian Watson (518)355-1766 5ft. 10in 89 3/4
Michael DeMarco (445)232-2233 6ft. 76 1/3
Create a new, temporary SAS data set (Names_And_More) using the permanent SAS
data set Names_And_More with the following changes:
a. Name has only single blanks between the first and last name.
b. Phone contains only digits (and is still a character value).

Solution:



Result:
Learning: 
  • In this we have used compbl that means compress blank for converting blanks into a single blank
  • We can also convert the case of observation,here Name convert in only single blank between first and last name like Roger Cody. Previously it consists five blanks
  • we  have used kd in Phone variable that means keep digits (deletes everything else)

Working with Arrays


Problem 14:

Using the SAS data set Nines, create a new temporary SAS data set (Nonines) where
all values of 999 are replaced by SAS missing values. Do this without explicitly
naming the numeric variables in data set Nines (use _NUMERIC_ when you define
your array).

Solution:


Result:

Learning:
  • Arrays are used to perform a similar operation on a group variables.
  • In this problem, we have Sas data set B29.nines that contains several numeric variable.
  • this data set used a value of 999 whenever there was a missing value. Values are like 999 or 9999 to represent this missing value.
  • All the values from the data set that is 999 is replaced by missing value with the help of this function


Presenting and Summarizing Your Data

Problem 15:

List the first 10 observations in data set Blood. Include only the variables Subject,
WBC (white blood cell), RBC (red blood cell), and Chol. Label the last three
variables “White Blood Cells,” “Red Blood Cells,” and “Cholesterol,” respectively.
Omit the Obs column, and place Subject in the first column. Be sure the column

headings are the variable labels, not the variable names.

Solution:



Result:


Learning:
  • In this sum, we have to find the first 10 observation from a large blood data set.
  • For that we have used title statement with double quotes that is First 10 observation in blood.
  • To run only first 10, we have obs statement in proc print step that is here like (obs=10) with label.
  • Here we have also used label statement for column name to be more readable WBC means White Blood cells and RBC means Red Blood Cells etc.

Creating Customized Reports

Problem 16:


Use PROC REPORT to create a report, as shown here:
Note: The data set is Blood, and the variables to be listed are Subject, WBC, and
RBC. All three variables are numeric (be careful). There is a line after the
heading.
First 5 Observations from Blood Data Set
White Red
Subject Blood Blood
Number Cells Cells
___________________

1 7,710 7.40
2 6,560 4.70
3 5,690 7.53
4 6,680 6.85
5 . 7.72

Solution:


Result: