Friday, 2 October 2015

Ch 20 - Generating High-Quality Graphics

Chapter 20(Generating High-Quality Graphics)


This chapter introduces some basic concepts behind SAS/GRAPH software. Here you
will see how to create simple bar charts, scatter plots, and line graphs. There are many
excellent books and manuals available from SAS Press.

Chapter#20 and Problem#3

Using the SAS data set Bicycles, produce a vertical bar chart showing the
distribution of Total Sales (TotalSales). Use midpoints of 0 to 12,000, with intervals
of 2,000.

Solution:

Result:


Chapter#20 and Problem#9

Using the SAS data set Stocks, produce a scatter plot of Price versus Date. Use the
dot as the plotting symbol and connect the dots with a smooth line.

Solution:

Result:







Back to Index

Ch19 - Output Delivery System

Chapter 19 (Output Delivery System)



One of the most important changes in SAS, the introduction of Output Delivery System called as ODS. Each SAS procedure produce its unique output . This output was difficult to incorporate into other documents, such as Microsoft Office Word orMicrosoft Office PowerPoint.The Output Delivery System changed all that. Now each SAS procedure creates outputobjects that can be sent to such destinations as HTML, RTF, PDF, and SAS data sets.
 Not only can you send your SAS output to all of these formats, you can, if you are brave enough, customize the output’s fonts, colors, size, and
layout.

Chapter#19 and Problem#1

Run the following program, sending the output to an HTML file. Issue the
appropriate commands to prevent SAS from creating a listing file. See Chapter 16,
Problem 2, for the note about creating formats for this data set.
title "Sending Output to an HTML File";
proc print data=learn.college(obs=8) noobs;
run;
proc means data=learn.college n mean maxdec=2;
var GPA ClassRank;
run;

Solution:
Result:


Chapter#19 and Problem#3

Run the same two procedures as shown in Problem 1, except use the JOURNAL (or

FANCYPRINTER) style instead of the default style.

Solution:

Result:









Ch18 - Creating Tabular Reports

Chapter 18(Creating Tabular Reports)


In this chapter, I am  going to generate the report in tabular form. that displaying frequencies, percentages, and descriptive statistics (such as sums and means) broken by one or more CLASS variables.
Chapter#18 and Problem#1


All the problems, except Problem 10, use the SAS data set College.
Note: Data set College has permanent formats for Gender, SchoolSize, and Scholarship.
Either make this format catalog available to SAS (see Chapter 5), run the PROC
FORMAT statements here, or use the system option NOFMTERR (no format
error) that allows you to access SAS data sets that have permanent user-defined
formats without causing an error.
proc format;
value $yesno 'Y','1' = 'Yes'
'N','0' = 'No'
' ' = 'Not Given';
value $size 'S' = 'Small'
'M' = 'Medium'
'L' = 'Large'
' ' = 'Missing';
value $gender 'F' = 'Female'
'M' = 'Male'
' ' = 'Not Given';
run;
For each problem, you need to write the appropriate PROC TABULATE statements to
produce the given table.

Produce the following table. Note that the last row in the table represents all subjects.
Demographics from COLLEGE Data Set-

Solution:

Result:


Learning:
1# In this  problem, I have learnt what happened when we use proc tabulate data step 
2# proc tabulate uses a class statement to allow you to specify variables that represent category (often) character variable where we want to compute frequencies or percentage.
3# In this problem, gender, scholarship and schoolsize taken as a class variable and tables statement specify the appearance.
4# Any variable listed in a TABLE statement must be listed in a CLASS statement or a VAR statement (to be discussed shortly).

Chapter#18 and Problem #3

Produce the following table. Note that the ALL column has been renamed Total and
Gender has been formatted.
Demographics from COLLEGE Data Set

Solution:

Result:

Learning:
1# In this problem, I have broken down the school size in one or more variable of the class variable such as  gender or school size.
2# Here, I have combined frequencies and descriptive statistics in single table. 
3# School size broken into class gender, scholarship and school size.



Ch 17 - Counting Frequencies

Chapter 17(Counting Frequencies)


In this chapter I will learn how to generate frequency table with the help proc freq. Proc freq use for both character as well as numeric variable in one-way, two-way, and three-way. I can also use proc freq to create output data stes containing counts and percentage. We can also measure various statistics such as chi-square, odds ratio and relative risk 

Chapter#17 and Problem#1

Using the SAS data set Blood, generate one-way frequencies for the variables

Gender, BloodType, and AgeGroup. Use the appropriate options to omit the
cumulative statistics and percentages.

Solution:



Result:

Learning:
1# In this problem I have learnt how to generate frequency table 
2# I have used tables statement for compute frequency every variable in dataset
3# I have used nocum and noperecent that indicates that i do not want cumulative frequency as well as percentage.

Chapter#17 and Problem #3

Using the data set Blood, produce frequencies for the variable Chol (cholesterol).
Use a format to group the frequencies into three groups: low to 200 (normal), 201
and higher (high), and missing. Run PROC FREQ twice, once using the MISSING
option, and once without. Compare the percentages in both listings.

Solution:
Result:

Learning:
1# In this problem, I have proc format for formatting for labelling the output. it improves the appearance of the report when we used a format to label these values. 
2# proc freq used because I want to compute frequencies on formatted values, we can also use format to group values together into larger categories.


Ch16 - Summarizing Your Data

Chapter#16: (Summarizing Your Data)


In this chapter I am going to the use of proc means and proc summary to generate summary reports. Summary consists several statistics part like sum and means of numeric variables. However, these procedures are much more versatile and can be used to create summary
data sets that can then be analysed with more data or proc steps.


Chapter#16 and Problem#1:
Using the SAS data set College, compute the mean, median, minimum, and
maximum and the number of both missing and non-missing values for the variables

ClassRank and GPA. Report the statistics to two decimal places.

Solution:



Result:

Learning:
1# I have used title statement that gives the title for the output
2# how to use proc means to generate reports
3# I have used to variable that is ClassRank and GPA
4# for the customization I have used N and N miss
5# in code I used maxdec=2  that shows the values goes upto two decimal point

Chapter#16 and Problem #3
Using the SAS data set College, report the mean and median GPA and ClassRank

broken down by school size (SchoolSize). Do this twice, once using a BY statement,
and once using a CLASS statement.

Solution:
Result:

Learning:
1# when we want to see descriptive statistics for each level of another variable then we use by statement.
2# when we use this first of all we have to sort the data set by the same variable or variables you list in by statement
3#  here i sorted the data set by school size thats why you can see in result display output by Missing, Small, Large school size.
4# the same thing i have done with the use of class statement that shows the same result


Chapter#16 and Problem#5
Using the SAS data set College, report the mean GPA for the following categories of

ClassRank: 0–50 = bottom half, 51–74 = 3rd quartile, and 75 to 100 = top
quarter. Do this by creating an appropriate format. Do not use a DATA step.

Solution:

Result:

Learning:
1# One very nice feature of using a CLASS statement (besides not having to sort your data)is that SAS uses formatted values of the CLASS variable(s).
2# Here I have used proc format statement to the procedure and changing how the class variable groups your data.
3# Like here I have used for Bottom Half, Third Quartile and Top Quartile and formatting is done on the basis of ClassRank.




SAS Index

SAS 2nd Practice Problem

Chapter#16: (Summarizing Your Data)

In this chapter I am going to the use of proc means and proc summary to generate summary reports. Summary consists several statistics part like sum and means of numeric variables. However, these procedures are much more versatile and can be used to create summary
data sets that can then be analysed with more data or proc steps.


Chapter#16 and Problem#1:
Using the SAS data set College, compute the mean, median, minimum, and
maximum and the number of both missing and non-missing values for the variables

ClassRank and GPA. Report the statistics to two decimal places.

Solution:


 Result:

Learning:
1# I have used title statement that gives the title for the output
2# how to use proc means to generate reports
3# I have used to variable that is ClassRank and GPA
4# for the customization I have used N and N miss
5# in code I used maxdec=2  that shows the values goes upto two decimal point

Chapter#16 and Problem #3
Using the SAS data set College, report the mean and median GPA and ClassRank

broken down by school size (SchoolSize). Do this twice, once using a BY statement,
and once using a CLASS statement.

Solution:
Result:

Learning:
1# when we want to see descriptive statistics for each level of another variable then we use by statement.
2# when we use this first of all we have to sort the data set by the same variable or variables you list in by statement
3#  here i sorted the data set by school size thats why you can see in result display output by Missing, Small, Large school size.
4# the same thing i have done with the use of class statement that shows the same result


Chapter#16 and Problem#5
Using the SAS data set College, report the mean GPA for the following categories of

ClassRank: 0–50 = bottom half, 51–74 = 3rd quartile, and 75 to 100 = top
quarter. Do this by creating an appropriate format. Do not use a DATA step.

Solution:

Result:

Learning:
1# One very nice feature of using a CLASS statement (besides not having to sort your data)is that SAS uses formatted values of the CLASS variable(s).
2# Here I have used proc format statement to the procedure and changing how the class variable groups your data.
3# Like here I have used for Bottom Half, Third Quartile and Top Quartile and formatting is done on the basis of ClassRank.

Chapter 17(Counting Frequencies)

In this chapter I will learn how to generate frequency table with the help proc freq. Proc freq use for both character as well as numeric variable in one-way, two-way, and three-way. I can also use proc freq to create output data stes containing counts and percentage. We can also measure various statistics such as chi-square, odds ratio and relative risk 

Chapter#17 and Problem#1

Using the SAS data set Blood, generate one-way frequencies for the variables

Gender, BloodType, and AgeGroup. Use the appropriate options to omit the
cumulative statistics and percentages.

Solution:



Result:

Learning:
1# In this problem I have learnt how to generate frequency table 
2# I have used tables statement for compute frequency every variable in dataset
3# I have used nocum and noperecent that indicates that i do not want cumulative frequency as well as percentage.

Chapter#17 and Problem #3

Using the data set Blood, produce frequencies for the variable Chol (cholesterol).
Use a format to group the frequencies into three groups: low to 200 (normal), 201
and higher (high), and missing. Run PROC FREQ twice, once using the MISSING
option, and once without. Compare the percentages in both listings.

Solution:
Result:

Learning:
1# In this problem, I have proc format for formatting for labelling the output. it improves the appearance of the report when we used a format to label these values. 
2# proc freq used because I want to compute frequencies on formatted values, we can also use format to group values together into larger categories.

Chapter 18(Creating Tabular Reports)

In this chapter, I am  going to generate the report in tabular form. that displaying frequencies, percentages, and descriptive statistics (such as sums and means) broken by one or more CLASS variables.
Chapter#18 and Problem#1


All the problems, except Problem 10, use the SAS data set College.
Note: Data set College has permanent formats for Gender, SchoolSize, and Scholarship.
Either make this format catalog available to SAS (see Chapter 5), run the PROC
FORMAT statements here, or use the system option NOFMTERR (no format
error) that allows you to access SAS data sets that have permanent user-defined
formats without causing an error.
proc format;
value $yesno 'Y','1' = 'Yes'
'N','0' = 'No'
' ' = 'Not Given';
value $size 'S' = 'Small'
'M' = 'Medium'
'L' = 'Large'
' ' = 'Missing';
value $gender 'F' = 'Female'
'M' = 'Male'
' ' = 'Not Given';
run;
For each problem, you need to write the appropriate PROC TABULATE statements to
produce the given table.

Produce the following table. Note that the last row in the table represents all subjects.
Demographics from COLLEGE Data Set-

Solution:

Result:



Learning:
1# In this  problem, I have learnt what happened when we use proc tabulate data step 
2# proc tabulate uses a class statement to allow you to specify variables that represent category (often) character variable where we want to compute frequencies or percentage.
3# In this problem, gender, scholarship and schoolsize taken as a class variable and tables statement specify the appearance.
4# Any variable listed in a TABLE statement must be listed in a CLASS statement or a VAR statement (to be discussed shortly).

Chapter#18 and Problem #3

Produce the following table. Note that the ALL column has been renamed Total and
Gender has been formatted.
Demographics from COLLEGE Data Set

Solution:

Result:

Learning:
1# In this problem, I have broken down the school size in one or more variable of the class variable such as  gender or school size.
2# Here, I have combined frequencies and descriptive statistics in single table. 
3# School size broken into class gender, scholarship and school size.

Chapter 19 (Output Delivery System)

One of the most important changes in SAS, the introduction of Output Delivery System called as ODS. Each SAS procedure produce its unique output . This output was difficult to incorporate into other documents, such as Microsoft Office Word orMicrosoft Office PowerPoint.The Output Delivery System changed all that. Now each SAS procedure creates outputobjects that can be sent to such destinations as HTML, RTF, PDF, and SAS data sets.
 Not only can you send your SAS output to all of these formats, you can, if you are brave enough, customize the output’s fonts, colors, size, and
layout.

Chapter#19 and Problem#1

Run the following program, sending the output to an HTML file. Issue the
appropriate commands to prevent SAS from creating a listing file. See Chapter 16,
Problem 2, for the note about creating formats for this data set.
title "Sending Output to an HTML File";
proc print data=learn.college(obs=8) noobs;
run;
proc means data=learn.college n mean maxdec=2;
var GPA ClassRank;
run;

Solution:
Result:


Chapter#19 and Problem#3

Run the same two procedures as shown in Problem 1, except use the JOURNAL (or

FANCYPRINTER) style instead of the default style.

Solution:

Result:


Chapter 20(Generating High-Quality Graphics)

This chapter introduces some basic concepts behind SAS/GRAPH software. Here you
will see how to create simple bar charts, scatter plots, and line graphs. There are many
excellent books and manuals available from SAS Press.

Chapter#20 and Problem#3

Using the SAS data set Bicycles, produce a vertical bar chart showing the
distribution of Total Sales (TotalSales). Use midpoints of 0 to 12,000, with intervals
of 2,000.

Solution:

Result:


Chapter#20 and Problem#9

Using the SAS data set Stocks, produce a scatter plot of Price versus Date. Use the
dot as the plotting symbol and connect the dots with a smooth line.

Solution:

Result:



Saturday, 19 September 2015

SAS Assignment #Praxis

Data Sets created are:
-Hosp
-data B29.hosp;
   do j = 1 to 1000;
      AdmitDate = int(ranuni(1234)*1200 + 15500);
      quarter = intck('qtr','01jan2002'd,AdmitDate);
      do i = 1 to quarter;
         if ranuni(0) lt .1 and weekday(AdmitDate) eq 1 then
            AdmitDate = AdmitDate + 1;
         if ranuni(0) lt .1 and weekday(AdmitDate) eq 7 then
            AdmitDate = AdmitDate - int(3*ranuni(0) + 1);
         DOB = int(25000*Ranuni(0) + '01jan1920'd);
         DischrDate = AdmitDate + abs(10*rannor(0) + 1);
         Subject + 1;
         output;
      end;
   end;
   drop i j;
   format AdmitDate DOB DischrDate mmddyy10.;
run;

- Sales data set
data B29.sales;
   input    EmpID     :       $4.
            Name      &      $15.
            Region    :       $5.
            Customer  &      $18.
            Date      : mmddyy10.
            Item      :       $8.
            Quantity  :        5.
            UnitCost  :  dollar9.;
   TotalSales = Quantity * UnitCost;
/*   format date mmddyy10. UnitCost TotalSales dollar9.;*/
   drop Date;
datalines;
1843 George Smith  North Barco Corporation  10/10/2006 144L 50 $8.99
1843 George Smith  South Cost Cutter's  10/11/2006 122 100 $5.99
1843 George Smith  North Minimart Inc.  10/11/2006 188S 3 $5,199
1843 George Smith  North Barco Corporation  10/15/2006 908X 1 $5,129
1843 George Smith  South Ely Corp.  10/15/2006 122L 10 $29.95
0177 Glenda Johnson  East Food Unlimited  9/1/2006 188X 100 $6.99
0177 Glenda Johnson  East Shop and Drop  9/2/2006 144L 100 $8.99
1843 George Smith  South Cost Cutter's  10/18/2006 855W 1 $9,109
9888 Sharon Lu  West Cost Cutter's  11/14/2006 122 50 $5.99
9888 Sharon Lu  West Pet's are Us  11/15/2006 100W 1000 $1.99
0017 Jason Nguyen  East Roger's Spirits  11/15/2006 122L 500 $39.99
0017 Jason Nguyen  South Spirited Spirits  12/22/2006 407XX 100 $19.95
0177 Glenda Johnson  North Minimart Inc.  12/21/2006 777 5 $10.500
0177 Glenda Johnson  East Barco Corporation  12/20/2006 733 2 $10,000
1843 George Smith  North Minimart Inc.  11/19/2006 188S 3 $5,199

;

- Blood data set
data B29.blood;
   infile datalines truncover;
   length Gender $ 6 BloodType $ 2 AgeGroup $ 5;
   input Subject 
         Gender 
         BloodType 
         AgeGroup
         WBC 
         RBC 
         Chol;
   label Gender = "Gender"
         BloodType = "Blood Type"
         AgeGroup = "Age Group"
         Chol = "Cholesterol";
datalines;
1    Female AB Young 7710   7.4  258
2    Male   AB Old   6560   4.7  .
3    Male   A  Young 5690   7.53 184
4    Male   B  Old   6680   6.85 .
5    Male   A  Young .      7.72 187
6    Male   A  Old   6140   3.69 142
7    Female A  Young 6550   4.78 290
8    Male   O  Old   5200   4.96 151
9    Male   O  Young .      5.66 311
10   Female O  Young 7710   5.55 .
11   Male   B  Young .      5.62 152
12   Female O  Young 7410   5.85 241
13   Male   O  Young 5780   4.37 .
14   Female O  Old   5590   6.94 152
15   Female A  Old   6520   6.03 217
16   Female O  Young 7210   5.17 193
17   Male   A  Old   .      5.63 .
18   Male   O  Old   6410   6.02 224
19   Female A  Old   6360   3.74 211
20   Male   A  Young 7580   5.13 179
21   Female A  Old   7150   6.35 200
22   Female A  Young 8710   5.12 211
23   Female O  Young 7660   4.91 .
24   Female B  Young 8280   6.14 .
25   Female AB Old   7480   4.7  183
;

- Missing data set
data B29.missing;
   input A $ B $ C $;
datalines;
X Y Z
X Y Y
Z Z Z
X X .
Y Z .
X . .

;

- data set for Inventory and New Product

data B29.inventory;
   input Model $ Price;
   format Price dollar8.2;
datalines;
M567 23.50
S888 12.99
L776 159.98
X999 29.95
M123 4.59
S776 1.99
;
data B29.newproducts;
   input Model $ Price;
   format Price dollar8.2;
datalines;
L939 10.99
M135 .75
;

- data set for purchase
data B29.purchase;
   input CustNumber Model $ Quantity;
datalines;
101 L776 1
102 M123 10
103 X999 2
103 M567 1
;

- data set for health
data B29.health;
   input Subj : $3.
         Height
         Weight;
datalines;
001 68 155
003 74 250
004 63 110
005 60 95
;

- data set for Psych
data B29.psych;
   input ID : $3. Ques1-Ques10 Score1-Score5;
datalines;
001 1 3 2 4 5 4 3 4 5 4 90 92 93 90 88
002 3 3 . . 3 4 5 5 1 . 95 . . 86 85
003 . . . . 5 5 4 4 3 3 88 87 86 85 84
004 5 3 4 5 . 5 4 3 3 . 78 78 82 84 .
005 5 4 3 2 1 1 2 3 4 5 92 93 94 95 99

;
- data set for nines
data B29.nines;
   infile datalines missover;
   input x y z (Char1-Char3)(:$1.) a1-a5;
datalines;
1 2 3 a b c 99 88 77 66 55
2 999 999 d c e 999 7 999
10 20 999 b b b 999 999 999 33 44
;

Performing Conditional Processing

Problem 1 :
 Using the SAS data set Hosp, use PROC PRINT to list observations for Subject
values of 5, 100, 150, and 200. Do this twice, once using OR operators and once
using the IN operator.

Note: Subject is a numeric variable

Solution:

Result:


Learning: 
  • Here we can see that two result but both are the same. Only difference is that in first result there is observation column and in second is not.
  • In second one, we have used noobs in proc print statement that result no observation.
  • In coding, for the first one, you can see that the result obtain through OR operator and second by IN operator.


Problem 2 :
 Using the Sales data set, list the observations for employee numbers (EmpID) 9888
and 0177. Do this two ways, one using OR operators and the other using the IN
operator.
Note: EmpID is a character variable.

Solution:
Result:
Learning:
  •  In this problem we have used Where statement to subset data, when your input data is indexed, the Where statement might be more efficient.
  • You may also use a Where statement in  a SAS procedure to subset the data being processed.

Problem 3 :
 Starting with the Blood data set, create a new, temporary SAS data set containing
all the variables in Blood plus a new variable called CholGroup. Define this new
variable as follows:
􀂃 CholGroup Chol
􀂃 Low Low – 110
􀂃 Medium 111 – 140
􀂃 High 141 – High
Use a SELECT statement to do this.

Solution:

Result:

Learning:
  • Here we have used Select statement that is alternative of If and Else If statements. Select Chol group, if the statement is true, the statement following the when expression is executed and control skip to the end of the select group.

Performing Iterative Processing: Looping

Problem 4:
   
 Run the program here to create a temporary SAS data set called Vitals:
data vitals;
input ID : #3.
Age
Pulse
SBP
DBP;
label SBP = "Systolic Blood Pressure"
DBP = "Diastolic Blood Pressure";
datalines;
001 23 68 120 80
002 55 72 188 96
003 78 82 200 100
004 18 58 110 70
005 43 52 120 82
006 37 74 150 98
007 . 82 140 100
;

Solution:
Result:
Learning:
  • In this problem, we have learnt how to use label statement. Label statement basically use for column heading  this listing here is more readable but here label statement is not working.
  • For run the program through label you have to give the label statement in proc step.


Using this data set, create a new data set (NewVitals) with the following new
variables:
For subjects less than 50 years of age:
If Pulse is less than 70, set PulseGroup equal to Low;
otherwise, set PulseGroup equal to High.
If SBP is less than 130, set SBPGroup equal to Low;
otherwise, set SBPGroup equal to High.
For subjects greater than or equal to 50 years of age:
If Pulse is less than 74, set PulseGroup equal to Low;
otherwise, set PulseGroup equal to High.
If SBP is less than 140, set SBPGroup equal to Low;
otherwise, set SBPGroup equal to High.
You may assume there are no missing values for Pulse or SBP.

Solution:

Result:
Learning:
  •  In this problem, we have learnt how to use if else statement because question is itself self explanatory only we have to follow that step.
  • If statement is true then execute IF statement otherwise go to the Else statement like if Pulse less than (lt) 70 then it groups in 'Low ' pulse group otherwise 'High' pulse group.
Problem 5:

Run the program here to create a temporary SAS data set (MonthSales):

data monthsales;
input month sales @@;
/* add your line(s) here */
datalines;
1 4000 2 5000 3 . 4 5500 5 5000 6 6000 7 6500 8 4500
9 5100 10 5700 11 6500 12 7500
;
Modify this program so that a new variable, SumSales, representing Sales to date, is
added to the data set. Be sure that the missing value for Sales in month 3 does not
result in a missing value for SumSales.

Solution:


Result:
Learning: 
  • In this sum, we have created a new variable that is Sum Sales that consist cumulative sales.
  • Here we have also used retain statement for Sum Sales in starting the Sum Sales should be zero. for missing value we have given the not missing statement. if we didn't give the not missing statement then it prints the Sum Sales Value till observation 2 after that it prints only missing value in Sum Sales column.
Problem 6:

Count the number of missing values for the variables A, B, and C in the Missing data
set. Add the cumulative number of missing values to each observation (use variable
names MissA, MissB, and MissC). Use the MISSING function to test for the missing
values.

Solution:

Result:


Learning:
  • Here we have counted the total number of missing value for each individual column with the help of if (missing) statement 
  • As you can see in result that column MissA has total zero missing value while MissB amd MissC has one and three respectively.
Subsetting and Combining SAS Data Sets

Problem 7:

Using the SAS data set Blood, create two temporary SAS data sets called Subset_A
and Subset_B. Include in both of these data sets a variable called Combined equal to
.001 times WBC plus RBC. Subset_A should consist of observations from Blood
where Gender is equal to Female and BloodType is equal to AB. Subset_B should
consist of all observations from Blood where Gender is equal to Female, BloodType
is equal to AB, and Combined is greater than or equal to 14.

Solution:



Result:



Learning:
  • Here, there are two data set one is subset_a and and another is subset_b .
  • In data set subset_a, we have used Where condition with And statement while in subset_b, If with and statement but both doing the same thing first subsetting then combining .
  • This is most widely used in subsetting and combining statement.

Problem 8:

Print out the observations in the two data sets Inventory and NewProducts. Next,
create a new temporary SAS data set (Updated) containing all the observations in
Inventory followed by all the observations in NewProducts. Sort the resulting data
set and print out the observations.

Solution:

Result:

Learning:
  • In this problem we have used proc sort data that followed the by statement like here proc sort data =updated by model.
  • In result you can see that the column start with model column that is the beauty of proc sort data
  • this proc sort data is too large then it is very helpful, after sorting we can easily do other operation from the table

Problem 9: (Merging two data sets)

Merge the Purchase and Inventory data sets to create a new, temporary SAS data set
(Pur_Price) where the Price value found in the Inventory data set is added to each
observation in the Purchase data set, based on the Model number (Model). There are
some models in the Inventory data set that were not purchased (and, therefore, are
not in the Purchase data set). Do not include these models in your new data set.
Based on the variable Quantity in the Purchase data set, compute a total cost
(TotalCost) equal to Quantity times Price in this data set as well.

Solution:


Result:

Learning:
  • Sas use the term merge to describe the process of combining variables (columns) from two or more data sets like same thing that happened in this problem.
  • First, we have sort the both data set by the Model variable that means both table starting variable is Model .
  • Now we have merge both data set B29.inventory and B29.purchase in one data set that is B29.pur_price.
Working with Numeric Functions

Problem 10:

Using the SAS data set Health, compute the body mass index (BMI) defined as the
weight in kilograms divided by the height (in meters) squared. Create four other
variables based on BMI: 1) BMIRound is the BMI rounded to the nearest integer, 2)
BMITenth is the BMI rounded to the nearest tenth, 3) BMIGroup is the BMI rounded
to the nearest 5, and 4) BMITrunc is the BMI with a fractional amount truncated.
Conversion factors you will need are: 1 Kg equals 2.2 Lbs and 1 inch = .0254 meters.

Solution:


Result:

Learning:
  • Two of the most useful function are the ROUND and INT. as the name suggest round is used to round numbers.
  • The INT function returns the integer portion of a numeric value like in BMIGroup number is round by thats why in that column every number is multiple of 5.
  • Typical values for round-off units are .1, .01, .001 and so on.
Problem 11:

The SAS data set Psych contains an ID variable, 10 question responses (Ques1–
Ques10), and 5 scores (Score1–Score5). You want to create a new, temporary SAS
data set (Evaluate) containing the following:
a. A value (ScoreAve) consisting of the mean of the three highest Score values. If
there are fewer than three non-missing score values, ScoreAve should be
missing.
b. An average of Ques1–Ques10 (call it QuesAve) if there are seven or more nonmissing
values.
c. A composite score (Composit) equal to ScoreAve plus 10 times QuesAve.

Solution:

Result:
Learning:
  • In this problem we have learnt, how and why use Largest function.
  • Like in this sum, we have used (mean(largest(2,of  Score1-Score5))) that returns the mean of two largest value from the Score1-Score5.
  • We can also use max,min,sum,etc in the place of mean.

Working with Character Functions

Problem 12: 

  Look at the following program and determine the storage length of each of the
variables:
data storage;
length A $ 4 B $ 4;
Name = 'Goldstein';
AandB = A || B;
Cat = cats(A,B);
if Name = 'Smith' then Match = 'No';
else Match = 'Yes';
Substring = substr(Name,5,2);
run;
A _________________
B _________________
Name _________________
AandB _________________
Cat _________________
Match _________________
Substring _________________

Solution:


Result:


Learning:
  • Here we have used length function that returns the length of a character value but it not counts trailing blanks.
  • Lengthc returns the storage length of a string for each individual variable.

Problem 13:


Here is a listing of data set Names_And_More:
Listing of Data Set LEARN.NAMES_AND_MORE
Name Phone Height Mixed
Roger Cody (908)782-1234 5ft. 10in. 50 1/8
Thomas Jefferson (315) 848-8484 6ft. 1in. 23 1/2
Marco Polo (800)123-4567 5Ft. 6in. 40
Brian Watson (518)355-1766 5ft. 10in 89 3/4
Michael DeMarco (445)232-2233 6ft. 76 1/3
Create a new, temporary SAS data set (Names_And_More) using the permanent SAS
data set Names_And_More with the following changes:
a. Name has only single blanks between the first and last name.
b. Phone contains only digits (and is still a character value).

Solution:



Result:
Learning: 
  • In this we have used compbl that means compress blank for converting blanks into a single blank
  • We can also convert the case of observation,here Name convert in only single blank between first and last name like Roger Cody. Previously it consists five blanks
  • we  have used kd in Phone variable that means keep digits (deletes everything else)

Working with Arrays


Problem 14:

Using the SAS data set Nines, create a new temporary SAS data set (Nonines) where
all values of 999 are replaced by SAS missing values. Do this without explicitly
naming the numeric variables in data set Nines (use _NUMERIC_ when you define
your array).

Solution:


Result:

Learning:
  • Arrays are used to perform a similar operation on a group variables.
  • In this problem, we have Sas data set B29.nines that contains several numeric variable.
  • this data set used a value of 999 whenever there was a missing value. Values are like 999 or 9999 to represent this missing value.
  • All the values from the data set that is 999 is replaced by missing value with the help of this function


Presenting and Summarizing Your Data

Problem 15:

List the first 10 observations in data set Blood. Include only the variables Subject,
WBC (white blood cell), RBC (red blood cell), and Chol. Label the last three
variables “White Blood Cells,” “Red Blood Cells,” and “Cholesterol,” respectively.
Omit the Obs column, and place Subject in the first column. Be sure the column

headings are the variable labels, not the variable names.

Solution:



Result:


Learning:
  • In this sum, we have to find the first 10 observation from a large blood data set.
  • For that we have used title statement with double quotes that is First 10 observation in blood.
  • To run only first 10, we have obs statement in proc print step that is here like (obs=10) with label.
  • Here we have also used label statement for column name to be more readable WBC means White Blood cells and RBC means Red Blood Cells etc.

Creating Customized Reports

Problem 16:


Use PROC REPORT to create a report, as shown here:
Note: The data set is Blood, and the variables to be listed are Subject, WBC, and
RBC. All three variables are numeric (be careful). There is a line after the
heading.
First 5 Observations from Blood Data Set
White Red
Subject Blood Blood
Number Cells Cells
___________________

1 7,710 7.40
2 6,560 4.70
3 5,690 7.53
4 6,680 6.85
5 . 7.72

Solution:


Result: