SAS - Only Important Parts

Chitranshu Vashishth

29th January 2025

Table of Contents

SAS - Only Important PartsTable of ContentsCreate Datasets and set statement.1. Creating Datasets in SASBasic Explanation:Key Methods:2. SET StatementBasic Explanation:Key Uses:Practical Example for Credit Risk Context:Common Pitfalls to Avoid:Best Practices:Keep columns when reading / appending / manipulatingKEEP Statement and OptionsBasic Explanation:Different Methods to Use KEEP1. KEEP as Dataset Option2. KEEP as Statement3. KEEP in Multiple Dataset ScenariosPractical Example for Credit Risk Context:Advanced Usage with MERGE:Important Considerations:Best Practices:Common Pitfalls to Avoid:Quick Reference Table:Drop columns when reading / appending / manipulatingDROP Statement and OptionsBasic Explanation:Different Methods to Use DROP1. DROP as Dataset Option2. DROP as Statement3. DROP in Multiple Dataset ScenariosPractical Example for Credit Risk Context:Advanced Usage:1. Using DROP with MERGE:2. Using DROP with Variable Lists:Important Considerations:Best Practices:Common Pitfalls to Avoid:Quick Reference Table:Additional Tips:Data manipulation and creating new columnsData Manipulation and Column CreationBasic Explanation:1. Basic Column Creation2. Practical Credit Risk Example3. String Manipulation Examples4. Date Manipulations5. Advanced CalculationsCommon Functions for Credit Risk Analysis:Best Practices:Common Pitfalls to Avoid:Rename a column in-place or otherwiseRenaming Columns in SAS1. Using RENAME= Dataset Option2. Using RENAME Statement3. Using PROC DATASETS (In-Place Renaming)4. Using PROC SQLPractical Examples in Credit Risk Context:1. Comprehensive Example with Multiple Renaming Methods:2. Renaming Multiple Variables with Pattern:3. Combining with Other Operations:Best Practices:Common Pitfalls to Avoid:Quick Reference Table:Change formatting of a column for existing dataset, or while at creation/appending/mergeFormat Conversions in SASBasic Methods to Change Formats:1. Text to Number Conversions2. Number to Text Conversions3. Date Format Conversions4. Comprehensive Credit Risk Example5. Changing Formats for Existing Dataset6. Common Date Format ConversionsCommon Format Types:Best Practices:Formats for string, number, date in SAS. How is date stored as and how to handle it1. String (Character) Formats2. Numeric Formats3. Date Formats and StorageHow SAS Stores Dates:4. Common Date Formats5. Date Handling and Calculations6. Datetime and Time Formats7. Best Practices for Date HandlingCommon Date Functions Reference:Length, Format, Informat1. LENGTH Statement2. FORMAT Statement3. INFORMAT StatementComprehensive Example Using All ThreeCommon Format Types Reference TableAdvanced Usage Examples1. Complex Data Structure2. Using ATTRIB Statement (Combines LENGTH, FORMAT, and INFORMAT)3. Handling Special CasesBest PracticesPut, Input, Putn, Inputn and their usage within macro function context1. Basic Functions Overview2. PUT vs PUTN3. INPUT vs INPUTN4. Macro Context Usage5. Complex Examples in Credit Risk Context6. Error Handling Examples7. Common Use Cases Reference TableBest Practices:PROC SQL1. Basic SQL Operations2. Joins3. Aggregations and Group By4. Subqueries and Complex Logic5. Data Modification6. Advanced Features7. Complex Example with Multiple FeaturesBest Practices:Merging datasets on both axes including handling mismatching column types, and mismatching column lengths1. Basic Merge TypesA. Horizontal Merge (MERGE Statement)B. Vertical Merge (SET Statement)2. Handling Column Type Mismatches3. Handling Length Mismatches4. Complex Merge Scenarios5. Handling Multiple Types of Mismatches6. Best Practices and TipsSAS string or character functions1. Basic String Functions2. String Extraction and Searching3. Advanced String Manipulation4. Pattern Matching and Regular Expressions5. String Functions for Credit Risk Analysis6. Common String Function Reference Table7. Practical Examples8. Error Handling with String FunctionsBest Practices:SAS date and time functions1. Basic Date and Time Functions2. Interval Differences Between Dates3. Date Shifting (Past and Future Dates)4. Date Components and Extraction5. Complex Date Calculations for Credit Risk6. Working with Time Components7. Date/Time Function Reference Table8. Common Date FormatsBest Practices:Sorting a dataset and De-duplicating a dataset on different set of columns or entire dataset1. Basic Sorting Using PROC SORT2. De-duplication Using NODUPKEY/NODUPREC3. Comprehensive Example with Credit Data4. Advanced De-duplication Using DATA Step5. Using FIRST. and LAST. Variables6. Using SQL for De-duplication7. Complex Sorting and De-duplication8. Best Practices and TipsCommon Pitfalls to Avoid:Macro Programming1. Macro Variables - Basic Creation and Usage2. Referencing Macro Variables3. Macro Functions4. Complex Macro Example for Credit Risk Analysis5. Advanced Macro Techniques6. Error Handling and Debugging7. Utility MacrosBest Practices:Creating custom macro functions1. Basic Macro Function Structure2. Simple Custom Functions3. String Manipulation Functions4. Date Manipulation Functions5. Validation Functions6. Complex Business Logic Functions7. Utility Functions8. Error Handling FunctionsBest Practices:Loops within data step and within Macros1. DATA Step LoopsA. DO Loop (Simple Counter)B. DO WHILE LoopC. DO UNTIL LoopD. Nested DO LoopsE. Complex DATA Step Loop Example2. Macro LoopsA. %DO LoopB. %DO %WHILE LoopC. %DO %UNTIL LoopD. Complex Macro Loop Examples3. Combined DATA Step and Macro Loops4. Error Handling in LoopsBest Practices:Conditional blocks within data step and within Macros1. DATA Step Conditional BlocksA. Basic IF-THEN-ELSEB. IF-THEN-DO BlocksC. SELECT StatementsD. Complex Nested Conditions2. Macro Conditional BlocksA. Basic %IF-%THEN-%ELSEB. Complex Macro ConditionsC. Conditional Macro Execution3. Combined Complex ExamplesBest Practices:Read/Export from CSV and Excel, Create Inline Datasets, Delete Datasets1. Reading FilesA. Reading CSV FilesB. Reading Excel Files2. Exporting FilesA. Exporting to CSVB. Exporting to Excel3. Creating Datasets In-PlaceA. Simple Dataset CreationB. Creating Empty Dataset with Structure4. Deleting DatasetsA. Using PROC DELETEB. Using PROC DATASETS5. Complex ExamplesA. Comprehensive File Processing6. Best PracticesPROC SUMMARY, PROC FREQ, PROC FORMATS1. PROC SUMMARY2. PROC FREQ3. PROC FORMAT4. Combined Complex ExampleBest Practices:That's all folks 🎉✨

 

Create Datasets and set statement.

Here's a comprehensive tutorial on creating datasets and using the SET statement in SAS:

1. Creating Datasets in SAS

Basic Explanation:

In SAS, datasets (also called SAS tables) are the primary way to store and manipulate data. There are multiple ways to create datasets:

Key Methods:

  1. Basic Dataset Creation

  1. Creating Dataset from Existing Dataset

  1. Creating Empty Dataset

2. SET Statement

Basic Explanation:

The SET statement is used to read observations from one or more existing SAS datasets.

Key Uses:

  1. Reading Single Dataset

  1. Combining Multiple Datasets (Vertically)

  1. Using SET with Options

Practical Example for Credit Risk Context:

Common Pitfalls to Avoid:

  1. Not specifying variable lengths in empty datasets

  2. Forgetting to use RUN statements

  3. Not considering dataset options (like KEEP, DROP, WHERE) when using SET

  4. Not handling missing values appropriately

Best Practices:

  1. Always define variable attributes (length, format) explicitly for empty datasets

  2. Use meaningful dataset names

  3. Comment your code

  4. Consider using WHERE clauses for filtering during SET operations

  5. Always verify the dataset creation with PROC CONTENTS or PROC PRINT


Keep columns when reading / appending / manipulating

 

KEEP Statement and Options

Basic Explanation:

KEEP is used to select specific variables (columns) to retain in the output dataset. It can be used in multiple ways:

  1. As a dataset option

  2. As a statement in DATA step

  3. During dataset manipulation operations

Different Methods to Use KEEP

1. KEEP as Dataset Option

2. KEEP as Statement

3. KEEP in Multiple Dataset Scenarios

Practical Example for Credit Risk Context:

Advanced Usage with MERGE:

Important Considerations:

  1. Performance Impact:

  1. Timing Differences:

Best Practices:

  1. Documentation

  1. Variable Lists

Common Pitfalls to Avoid:

  1. Not checking if required variables exist before using KEEP

  2. Forgetting that KEEP options are processed before statements

  3. Not considering the order of operations when using multiple data steps

  4. Dropping variables needed for calculations before they're used

Quick Reference Table:

MethodUsageWhen to Use
Dataset Optiondataset(KEEP=var1 var2)Most efficient, use when possible
StatementKEEP var1 var2;When need to keep variables based on conditions
Multiple DatasetsSET data1(KEEP=...) data2(KEEP=...);When combining datasets

Drop columns when reading / appending / manipulating

DROP Statement and Options

Basic Explanation:

DROP is the opposite of KEEP - it's used to remove specific variables (columns) from the output dataset. Like KEEP, it can be used in multiple ways:

  1. As a dataset option

  2. As a statement in DATA step

  3. During dataset manipulation operations

Different Methods to Use DROP

1. DROP as Dataset Option

2. DROP as Statement

3. DROP in Multiple Dataset Scenarios

Practical Example for Credit Risk Context:

Advanced Usage:

1. Using DROP with MERGE:

2. Using DROP with Variable Lists:

Important Considerations:

  1. Performance Impact:

  1. Timing Differences:

Best Practices:

  1. Documentation

  1. Efficient Variable Selection

Common Pitfalls to Avoid:

  1. Dropping variables needed for calculations

  2. Not considering the order of operations

  3. Dropping key identifier variables

  4. Using DROP when KEEP might be more appropriate

Quick Reference Table:

MethodUsageBest Use Case
Dataset Optiondataset(DROP=var1 var2)Most efficient, use when possible
StatementDROP var1 var2;When need to drop after calculations
Variable ListsDROP var: temp:When dropping groups of similarly named variables

Additional Tips:

  1. Use PROC CONTENTS to verify dropped variables:

  1. Combine with other options:


Data manipulation and creating new columns

Data Manipulation and Column Creation

Basic Explanation:

In SAS, you can create new columns through:

  1. Direct assignment

  2. Mathematical operations

  3. Conditional logic

  4. Functions

  5. String operations

1. Basic Column Creation

2. Practical Credit Risk Example

3. String Manipulation Examples

4. Date Manipulations

5. Advanced Calculations

Common Functions for Credit Risk Analysis:

Best Practices:

  1. Always initialize variables:

  1. Use meaningful variable names

  2. Document calculations

  3. Consider missing values:

Common Pitfalls to Avoid:

  1. Not handling missing values

  2. Not declaring variable lengths for character variables

  3. Not considering the order of operations

  4. Forgetting to format new variables

  5. Not checking for division by zero


Rename a column in-place or otherwise

Renaming Columns in SAS

There are several methods to rename columns in SAS:

  1. Using RENAME= dataset option

  2. Using RENAME statement

  3. Using PROC DATASETS

  4. Using PROC SQL

1. Using RENAME= Dataset Option

2. Using RENAME Statement

3. Using PROC DATASETS (In-Place Renaming)

4. Using PROC SQL

Practical Examples in Credit Risk Context:

1. Comprehensive Example with Multiple Renaming Methods:

2. Renaming Multiple Variables with Pattern:

3. Combining with Other Operations:

Best Practices:

  1. Consistent Naming Convention

  1. Documentation

Common Pitfalls to Avoid:

  1. Not checking if new names already exist

  2. Using reserved words as new names

  3. Not maintaining proper variable lengths after renaming

  4. Forgetting to rename related variables consistently

Quick Reference Table:

MethodUse CaseAdvantagesDisadvantages
RENAME= OptionDuring data step processingClear syntaxCreates new dataset
RENAME StatementSimple renamingEasy to readCreates new dataset
PROC DATASETSIn-place renamingNo new dataset neededMust be careful with active datasets
PROC SQLComplex transformationsFlexibleMay be slower for simple renaming

Change formatting of a column for existing dataset, or while at creation/appending/merge

Format Conversions in SAS

Basic Methods to Change Formats:

  1. Using FORMAT statement

  2. Using INPUT/PUT functions

  3. Using ATTRIB statement

  4. Using PROC DATASETS

1. Text to Number Conversions

2. Number to Text Conversions

3. Date Format Conversions

4. Comprehensive Credit Risk Example

5. Changing Formats for Existing Dataset

6. Common Date Format Conversions

Common Format Types:

Best Practices:

  1. Always validate conversions:

  1. Handle missing values:

  1. Document format changes:


Formats for string, number, date in SAS. How is date stored as and how to handle it

1. String (Character) Formats

2. Numeric Formats

3. Date Formats and Storage

How SAS Stores Dates:

4. Common Date Formats

5. Date Handling and Calculations

6. Datetime and Time Formats

7. Best Practices for Date Handling

Common Date Functions Reference:

FunctionPurposeExample
TODAY()Current datetoday = TODAY();
MDY()Create date from M/D/Ydate = MDY(1,1,2024);
YEAR()Extract yearyear = YEAR(date);
MONTH()Extract monthmonth = MONTH(date);
DAY()Extract dayday = DAY(date);
WEEKDAY()Get day of weekdow = WEEKDAY(date);
INTCK()Count intervalsmonths = INTCK('MONTH', date1, date2);
INTNX()Advance intervalsnext_month = INTNX('MONTH', date, 1);
YRDIF()Calculate yearsyears = YRDIF(date1, date2, 'AGE');

Length, Format, Informat

1. LENGTH Statement

Used to specify the number of bytes for storing variables.

2. FORMAT Statement

Specifies how values are displayed when printed or viewed.

3. INFORMAT Statement

Specifies how SAS should read input values.

Comprehensive Example Using All Three

Common Format Types Reference Table

CategoryFormatExampleDescription
NumericBEST12.1234.56Best representation
 COMMA12.21,234.56With commas
 DOLLAR12.2$1,234.56Currency
 PERCENT8.212.34%Percentage
 Z6.001234Leading zeros
Character$CHAR.TextStandard character
 $UPCASE.TEXTUppercase
 $PROPCASE.TextProper case
DateDATE9.01JAN2024Standard date
 DDMMYY10.01/01/2024DD/MM/YYYY
 DATETIME20.01JAN2024:12:00:00Date and time

Advanced Usage Examples

1. Complex Data Structure

2. Using ATTRIB Statement (Combines LENGTH, FORMAT, and INFORMAT)

3. Handling Special Cases

Best Practices

  1. Define Lengths Early

  1. Use Consistent Formats

  1. Document with Labels


Put, Input, Putn, Inputn and their usage within macro function context

1. Basic Functions Overview

2. PUT vs PUTN

3. INPUT vs INPUTN

4. Macro Context Usage

5. Complex Examples in Credit Risk Context

6. Error Handling Examples

7. Common Use Cases Reference Table

FunctionContextUse CaseExample
PUTData StepFormat numeric for displayPUT(num, DOLLAR12.2)
PUTMacroConvert numeric to character%sysfunc(PUT(&num, 12.2))
PUTNData StepFormat numeric with specific formatPUTN(num, 'COMMA12.2')
PUTNMacroFormat numeric in macro%sysfunc(PUTN(&num, DOLLAR12.2))
INPUTData StepConvert character to numeric/dateINPUT(char, DATE9.)
INPUTMacroParse string in macro%sysfunc(INPUT(&str, 12.2))
INPUTNData StepConvert string to numericINPUTN(char, 'BEST12.')
INPUTNMacroConvert string to numeric in macro%sysfunc(INPUTN(&str, BEST12.))

Best Practices:

  1. Always check for missing values after conversion

  2. Use appropriate format specifications

  3. Handle errors gracefully in macro context

  4. Document format specifications

  5. Consider performance implications in large datasets


PROC SQL

1. Basic SQL Operations

2. Joins

3. Aggregations and Group By

4. Subqueries and Complex Logic

5. Data Modification

6. Advanced Features

7. Complex Example with Multiple Features

Best Practices:

  1. Use Appropriate Joins

  1. Handle Missing Values

  1. Use Indexes for Performance


Merging datasets on both axes including handling mismatching column types, and mismatching column lengths

1. Basic Merge Types

A. Horizontal Merge (MERGE Statement)

B. Vertical Merge (SET Statement)

2. Handling Column Type Mismatches

3. Handling Length Mismatches

4. Complex Merge Scenarios

5. Handling Multiple Types of Mismatches

6. Best Practices and Tips

  1. Always Check Data Before Merging

  1. Document Length Specifications

  1. Handle Missing Values

  1. Use Appropriate Merge Type


SAS string or character functions

1. Basic String Functions

2. String Extraction and Searching

3. Advanced String Manipulation

4. Pattern Matching and Regular Expressions

5. String Functions for Credit Risk Analysis

6. Common String Function Reference Table

FunctionPurposeExampleResult
LENGTHGet string lengthLENGTH("ABC")3
UPCASEConvert to uppercaseUPCASE("abc")"ABC"
LOWCASEConvert to lowercaseLOWCASE("ABC")"abc"
PROPCASEConvert to proper casePROPCASE("john smith")"John Smith"
TRIMRemove trailing spacesTRIM(" ABC ")" ABC"
LEFTRemove leading spacesLEFT(" ABC")"ABC "
COMPRESSRemove charactersCOMPRESS("A B C")"ABC"
SUBSTRExtract substringSUBSTR("ABCDE", 2, 3)"BCD"
SCANExtract wordSCAN("A B C", 2)"B"
FINDFind positionFIND("ABC", "B")2
TRANWRDReplace textTRANWRD("ABC", "B", "X")"AXC"
CATSConcatenate, remove spacesCATS("A ", " B")"AB"
CATXConcatenate with delimiterCATX(",", "A", "B")"A,B"

7. Practical Examples

8. Error Handling with String Functions

Best Practices:

  1. Always handle missing values

  2. Use appropriate string functions for the task

  3. Consider case sensitivity

  4. Use efficient string operations for large datasets

  5. Properly allocate string lengths

  6. Document complex string manipulations


SAS date and time functions

1. Basic Date and Time Functions

2. Interval Differences Between Dates

3. Date Shifting (Past and Future Dates)

4. Date Components and Extraction

5. Complex Date Calculations for Credit Risk

6. Working with Time Components

7. Date/Time Function Reference Table

FunctionPurposeExampleResult
TODAY()Current dateTODAY()Current date
DATETIME()Current datetimeDATETIME()Current datetime
TIME()Current timeTIME()Current time
INTCKCount intervalsINTCK('MONTH', date1, date2)Number of months
INTNXShift date by intervalINTNX('MONTH', date, 1)Next month
YRDIFCalculate years betweenYRDIF(birth_date, today, 'AGE')Age in years
YEARExtract yearYEAR(date)Year number
MONTHExtract monthMONTH(date)Month number
DAYExtract dayDAY(date)Day number
WEEKDAYGet day of weekWEEKDAY(date)1-7 (Sun-Sat)
QTRGet quarterQTR(date)1-4
WEEKGet week numberWEEK(date)Week number

8. Common Date Formats

Best Practices:

  1. Always use appropriate formats for date display

  2. Consider timezone implications

  3. Handle missing dates appropriately

  4. Use consistent date formats across analysis

  5. Document date calculations

  6. Consider business days vs calendar days

  7. Handle leap years appropriately


Sorting a dataset and De-duplicating a dataset on different set of columns or entire dataset

1. Basic Sorting Using PROC SORT

2. De-duplication Using NODUPKEY/NODUPREC

3. Comprehensive Example with Credit Data

4. Advanced De-duplication Using DATA Step

5. Using FIRST. and LAST. Variables

6. Using SQL for De-duplication

7. Complex Sorting and De-duplication

8. Best Practices and Tips

  1. Always Check Results

  1. Document Sort Orders

  1. Handle Missing Values

  1. Performance Optimization

Common Pitfalls to Avoid:

  1. Not checking for missing values

  2. Forgetting to verify record counts

  3. Not considering sort order impact on subsequent processing

  4. Using unnecessary sorting when SQL might be more efficient

  5. Not documenting de-duplication logic


Macro Programming

1. Macro Variables - Basic Creation and Usage

2. Referencing Macro Variables

3. Macro Functions

4. Complex Macro Example for Credit Risk Analysis

5. Advanced Macro Techniques

6. Error Handling and Debugging

7. Utility Macros

Best Practices:

  1. Documentation

  1. Parameter Validation

  1. Scope Control


Creating custom macro functions

1. Basic Macro Function Structure

2. Simple Custom Functions

3. String Manipulation Functions

4. Date Manipulation Functions

5. Validation Functions

6. Complex Business Logic Functions

7. Utility Functions

8. Error Handling Functions

Best Practices:

  1. Documentation

  1. Parameter Validation


Loops within data step and within Macros

1. DATA Step Loops

A. DO Loop (Simple Counter)

B. DO WHILE Loop

C. DO UNTIL Loop

D. Nested DO Loops

E. Complex DATA Step Loop Example

2. Macro Loops

A. %DO Loop

B. %DO %WHILE Loop

C. %DO %UNTIL Loop

D. Complex Macro Loop Examples

3. Combined DATA Step and Macro Loops

4. Error Handling in Loops

Best Practices:

  1. Always initialize counter variables

  2. Use appropriate loop type for the task

  3. Include error handling

  4. Avoid infinite loops

  5. Document loop conditions and exit criteria

  6. Consider performance for large iterations


Conditional blocks within data step and within Macros

1. DATA Step Conditional Blocks

A. Basic IF-THEN-ELSE

B. IF-THEN-DO Blocks

C. SELECT Statements

D. Complex Nested Conditions

2. Macro Conditional Blocks

A. Basic %IF-%THEN-%ELSE

B. Complex Macro Conditions

C. Conditional Macro Execution

3. Combined Complex Examples

Best Practices:

  1. Clear Logic Structure

  1. Error Handling

  1. Documentation

  1. Use Appropriate Conditional Structures


Read/Export from CSV and Excel, Create Inline Datasets, Delete Datasets

1. Reading Files

A. Reading CSV Files

B. Reading Excel Files

2. Exporting Files

A. Exporting to CSV

B. Exporting to Excel

3. Creating Datasets In-Place

A. Simple Dataset Creation

B. Creating Empty Dataset with Structure

4. Deleting Datasets

A. Using PROC DELETE

B. Using PROC DATASETS

5. Complex Examples

A. Comprehensive File Processing

6. Best Practices

  1. Error Handling

  1. Data Validation

  1. Performance Optimization


PROC SUMMARY, PROC FREQ, PROC FORMATS

1. PROC SUMMARY

2. PROC FREQ

3. PROC FORMAT

4. Combined Complex Example

Best Practices:

  1. Use Appropriate Options

  1. Handle Missing Values

  1. Document Format Definitions


 

 

That's all folks 🎉✨