sas,sas-macro , Sas Macro to semi-efficiently manipulate data

Sas Macro to semi-efficiently manipulate data


Tag: sas,sas-macro

Objective: Go from Have table + Help table to Want table. The current implementation (below) is slow. I believe this is a good example of how not to use SAS Macros, but I'm curious as to whether... 1. the macro approach could be salvaged / made fast enough to be viable (e.g. proc append is supposed to speed up the action of stacking datasets, but I was unable to see any performance gains.) 2. what all the alternatives would look like.

I have written a non-macro solution that I will post below for comparison sake.

data have ; 
input name $ term $; 
Joe   2000 
Joe   2002
Joe   2008 
Sally 2001
Sally 2003
; run; 

proc print ; run; 

data help ; 
input terms $ ; 
; run; 

proc print ; run; 

data want ; 
input name $ term $ status $; 
Joe   2000  here
Joe   2001  gone
Joe   2002  here
Joe   2003  gone
Joe   2004  gone
Joe   2005  gone
Joe   2006  gone
Joe   2007  gone
Joe   2008  here
Sally 2001  here
Sally 2002  gone
Sally 2003  here
; run; 

proc print data=have ; run; 

I can write a little macro to get me there for each individual:

proc sql ; 
create table studtermlist as 
select distinct term 
from have 
where NAME = "&NAME"
A.terms , 
"&Name" as Name,
when term is null THEN 'Gone'
end as status
from termlist a left join studtermlist b 
 on a.terms eq b.term 


proc print data=HEREGONE_Joe; run; 
proc print data=HEREGONE_Sally; run; 

But it's incomplete. If I loop through for (presumably quite a few names)...

*******need procedure for all names - grab info on have ; 
proc sql noprint; 
select distinct name into :namelist separated by ' '
from have
; quit;

%let n=&sqlobs ; 

%do i = 1 %to &n ; 
 %let currentvalue = %scan(&namelist,&i); 
 %put &currentvalue ; 
 %put &i ; 
%IF &i = 1 %then %do ; 
data base; set HEREGONE_&currentvalue; run; 
%IF &i gt 1 %then %do ; 
proc sql ; create table base as 
select * from base
select * from HEREGONE_&currentvalue
drop table HEREGONE_&currentvalue;
%end ; 


proc sort data=base ; by name terms; run; 
proc print data=base; run; 

So now I have want, but with 6,000 names, it takes over 20 minutes.


Let's try the alternative solution. For each name find the min/max term via a proc SQL data step. Then use a data step to create the time period table and merge that with your original table.

*Sample data;
data have ; 
input name $ term ; 
Joe   2000 
Joe   2002
Joe   2008 
Sally 2001
Sally 2003
; run; 

*find min/max of each name;
proc sql;
create table terms as
select name, min(term) as term_min, max(term) as term_max
from have
group by name
order by name;

*Create table with the time periods for each name;
data empty;
set terms;
do term=term_min to term_max;
drop term_min term_max;

*Create final table by merging the original table with table previously generated;
proc sql;
create table want as
select, a.term, case when missing(b.term) then 'Gone'
                        else 'Here' end as status
from empty a
left join have b
and a.term=b.term
order by, a.term;

EDIT: Now looking at your macro solution, part of the problem is that you're scanning your table too many times.


SAS Macro to Combine Municipal Proc SQL Statements Based on Date Criteria

I have a series of proc sql statements which pull data for Active, Inactive and Lapsed customers. I end up with 3 tables. *Customers_Active *Customers_InActive *Customers_Lapsed Active: 0-12M purchaser Inactive: 13-24M purchaser, did not purchase 0-12M (active day range minus 12 months) Lapsed: 25-36M purchaser, did not purchase 0-24M (inactive...

SAS proc sql with when then statements

I have a "time" var of years in my data. I need to create a new var based on the following with PROC SQL if time>mean(time)then new var=1 else, new var=0 I keep getting different error, how can I improve my code? proc sql; create table v3 as select*,case when...

Replicating random normal generated in SAS (rancor) in R, based on the same seed?

Given the same seed, is there a way to produce the exact same random normal numbers generated in SAS, using the rannor function, in R?

SAS: assign a quantile to a macro variable

In SAS, how can I assign the 97.5% quantile of the normal distribution to the macro variable z? Not working 1 %let z = quantile("normal", 0.975); Not working 2 %let z = %sysfunc(quantile("normal", 0.975)); ...

changing the order of values in a variable in sas data (row-wise ordering )

My sasdata output is like this. Can we make the values in a particular order ?(change the order of rows) ie In the order of "less than 10 lakh, between 10-20 lakh,between 21-30.... above 1 crore". I want to change the row order. Right now rows are ordered in ascending...

SAS: Looping over column names

I have a data set with the following structure: data account; input Index c1 c2 c3 c4 c5 c6 ; datalines; 4 30 20 10 30 40 20 3 50 20 30 50 10 20 ; run; In my file, there are 150+ columns of the "c"-Type containing numbers. In...

sas: proc sql select into with more than one output

I have following dataset data height; input name $ var $ value; datalines; John test1 175 Peter test1 180 Chris test1 140 John test2 178 Peter test2 182 Chris test2 148 ; run; I would like to make mean value of 2 tests for each students I able to make...

SAS Gplot overlay line plots

I am trying to plot two sets of line graphs on the same chart: /* make data */ data test ; do i = 1 to 2 ; do x = 1 to 5 ; y = i*x ; z = 0.5*i*x ; output; end ; end ; run ;...

SAS Proc SQL how to perform procedure only on N rows of a big table

I need to perform a procedure on a small set (e.g. 100 rows) of a very big table just to test the syntax and output. I have been running the following code for a while and it's still running. I wonder if it is doing something else. Or what is...

Sas .dat file without column headings only 1st row being read in sas studio

I'm using Sas studio university edition. I have a dat file without any column headings (4 columns). I'm trying to read it in with data van; infile "/folders/myfolders/test2/psek-win.dat"; input a $ b $ c $ d $; run; i.e. create my own names for the columns. It works, but only...

SAS Concatenate Multiple Variables to Create Data-Driven Macro Statements

In order to keep my process data-driven, I'm trying to concatenate multiple variables, separated by comma, in order to ultimately put them in a PROC SQL list to call in multiple macro statements that would otherwise clutter my SAS pogram. Take the following sample dataset: DATA TEST; INPUT YEAR CONDITION...

SAS: getting the filesize of created DBF file

I have SAS stored process that ceates DBF file from SAS data set rr_udf_value and finds its size (F_SIZE): filename dbfout "/SASInside/DBF/myfile"; proc export data=rr_udf_value outfile=dbfout dbms=dbf replace; run; %let f_nm=/SASInside/DBF/myfile.DBF; %let rc=%sysfunc(filename(onefile, &f_nm.)); %let fid=%sysfunc(fopen(&onefile)); %let F_SIZE=%sysfunc(finfo(&fid,File Size (bytes))); %put &F_SIZE; The problem is that the variable F_SIZE is...

Convert character string to date format

Im trying to convert a character string of $40. to date format. below is the column Month in the dataset test2 and its values: Month Apr 15 May 15 Jun 15 I have tried this code but not getting the result I'm expecting. data test; set test2; Month =inPUT(month,monyy5.); /*...

How to count the number of same items on two different lists in SAS

I am looking for the best way to count the number of executives shared by two firms. Currently, the data is arranged such that each row contains two firm IDs and a list of identifiers for each the board members of each firm. Currently, I have managed to obtain what...

Regex with whitespaces and preceding zeros

I want to match the string 11 with a regular Expression in SAS. The 11 can be preceded by zero or more 0 and/or by white spaces. Any other character is not allowed. Likewise, if anything there should only be white spaces following the 11. Examples: Match: 0000011 11 11<space><space>...

Duplicates issue

I have a problem with duplicates. Actually what I need is only the see duplicates but my table has many variables something like the below: a b c d e 32 ayi dam som kem 32 ayi dam som tws 32 ayi dam tsm tws 12 mm ds de ko...

SAS proc sql - how to read in log of variable but retain the variable's label

I am reading in variables from a few different datasets using proc sql, which I am trying to improve on. What I'd like to do is read in a variable from a dataset using proc sql, but take the log of the variable as it's read in, but keep the...

Which is faster, where statement or where data set option

The question is really straight forward, which one is faster? Considering we are using data step with two datasets in the set statement and the datasets have the same variables in them. From What Ive heard and read, if we subset them using the same condition, say date = "10jan2014"d,...

An efficient way to Copying values in subsequent records - SAS

I have a dataset that is grouped by category variables in the source data. For example: Bar | Foo1 | Foo2 | Foo3 Bar2 | Foo4 | Foo5 | Foo6 After I import the source data, the above would result in the first Variable (Parent) being populated on Record 1...

PROC SQL Update Efficiency for Large Datasets

I have a SAS Master Dataset with 10 Million Rows and 1800 columns. I need to update 10 columns using a transaction dataset with 15 million records, only for records with matching key. I tried running a proc sql update statement using the following code. proc sql; UPDATE lib1.master1 a...

SAS drop multiple variables indexed by tens

My question is likely stupid but I have not found an answer yet. I have a variable var index by tens : var10, var20... var90. At some point of my code I want to drop all of them. I can do data want(drop=var10 var20 var30 var40 var50 var60 var70 var80...

calculate market weighted return using SAS

I have four variables Name, Date, MarketCap and Return. Name is the company name. Date is the time stamp. MarketCap shows the size of the company. Return is its return at day Date. I want to create an additional variable MarketReturn which is the value weighted return of the market...

SAS: conditional statement error?

Could you please help me understand why this statement is incorrect (from a quiz). For some reason I can't see a problem. if total = 140 then status EQ 'works'; Thanks!...

Support results in association rules are less than 5%

I am facing an issue with Association rules. I have a dataset which consists of transaction ID and ProductID I have edited the variable and changed TransactionID role to "ID" and productID role to "Target" The minimum support % is set to 5%. But when i run the association i...

SAS how to get random selection by group randomly split into multiple groups

I have a simple data set of customers (about 40,000k) It looks like: customerid, group, other_variable a,blue,y b,blue,x c,blue,z d,green,y e,green,d f,green,r g,green,e I want to randomly select for each group, Y amounts of customers (along with their other variable(s). The catch is, i want to have two random selections...

Probt in sas for column of values

Im looking do a probt for a column of values in sas not just one and to give two tailed p values. I have the following code Id like to amend data all_ssr; x=.551447; df=25; p=(1-probt(abs(x),df))*2; put p=; run; however I would like x to be a column of values...

SAS Find Top Combinations in Dataset

Hell everyone -- I have some sales data which looks like this: data have; input order_id item $; cards; 1 A 1 B 2 A 2 C 3 B 4 A 4 B ; run; What I'm trying to find out is what are the most popular combinations of items...

how to calculate weighted average but exclude the object itself using SAS

There are four variables in my dataset. Company shows the company's name. Return is the return of Company at day Date. Weight is the weight of this company in the market. I want to keep all variables in the original file, and create an additional variable which is the market...

select maximum value of common column for multiple data set

I have a daily schedule process flow which refreshes a bunch of tables within the same library. At the end of the process flow, all tables should have the same up to date records. And I want to double check this via checking the maximum value of date. But problem...

SAS else if clause confusion

I'm running the following code: data new; set old; if visits=. then band='Poor'; else if visits=1 or visits=2 then band='Low'; else band='High'; run; My confusion is when the else if statement is changed to: else if visits=1 or 2 then band='Low'; Why does the value Low appear as the band...

SAS proc ttest modify HO

I have a data with GENDER=(1/0) INCOME SENIORITY=(1/0). I need to run a ttest on INCOME by GENDER for SENIORITY=1. As far as I know, the default of HO=0, which means that there is no difference between the genders, but how can I define an HO that will check if...

Insufficient authorisation to lst in SAS batch job

Today I faced a problem and solved it, but I am not quite sure why the problem occured. We have a SAS batch job: /path_to_script/ -log /some_path/Logs/replication_#Y.#m.#d_#H.#M.#s.log -batch -noterminal -logparm "rollover=session" -sysin /another_path/macros/ And today the job fell over with error: ERROR: Insufficient authorization to access /sas_path/sasconfig/Lev1/SASApp/replication.lst. I found that...

sas macro: argument to be a word in filename [duplicate]

This question already has an answer here: Why won't my macro variable resolve? 1 answer I have similar files in a specific folder. I need to run same program for every files. So I thought of using macro. But I encountered a problem. %macro xyz(cityname); *IMPORTING FILE; proc import...

How to Declare Global Array Variable in SAS?

I'm new to SAS and spinning my wheels. The SAS documentation and other Google searches have not helped me figure this out. How can I declare a global array variable that I can use in various procedures to loop through the contents? Here is what I've tried: %let fileArray =...

Extract ID's separated by dashes from text string

I have a dataset that has one concatenated text field. I am trying to break it into three text columns in SAS 9.4. Obs Var1 1 MAY12-KANSAS-ABCD6194-XY7199-BRULE 2 JAN32-OHIO-BZ5752-GARY My output for observation 1 should look like this: Obs Date State ID 1 MAY12 KANSAS ABCD6194-XY7199-BRULE Here's what I have,...

create a macro in sas

I have a report that is generated once a year. each report has the form of the year inside the name - report-2011.xls, report-2012.xls etc. each report contains the following vars: ID, SAL=average monthly salary of that year, Gender (0=male, 1=female), Married (0=not married, 1=married), I need to create a...

randomly select two observation and calculate the distance

I have a data set have with numerical column x. I want to randomly select any two distinct points and then calculate the distance between them. If I only do it once, then I just use proc surveyselect to generate another data set with two obs. proc surveyselect data=have out=want...

Which one is default ODS destination in SAS - Listing or HTML?

Which one is default ODS destination in SAS - Listing or HTML? In SAS BASE Prep book, it says Listing but in Step by Step SAS programming it says HTML.

How to change my SAS code to find the maximum number

I have a dataset looks like this. I want to create another variable which represent the total trading volume each day. My code shown below. But it seems that there is something wrong with my code, the calculated maximum trading volume N is wrong somehow. Can anyone tell me know...

Using for loop indices in variable generation SAS

I would like to set up a for loop in SAS where I would like to create time dependent tables. The idea is rather simple. I have multiple tables where I would like to left join them and i would like to this operation for every month. I dont have...

sas dynamic call symput with unknow number of fields in the dataset

i have following dataset data parm2; input a b c d e; datalines; 1 2 3 4 A ; run; Problem1: I would like have a set of macro variables. Assume i do not know the number of fields and its corresponding name of the field. Problem2: fields are not...

sas create a variable that is equal to obs column

I have a file with 10 obs. and different parameters. I need to add to my data a new variable of 'ID' for each observation- i.e a column of numbers 1-10. How can I add a variable that is simply equal to the obs column? I thought about doing it...

What's wrong with these macro parameters?

I have the following simplified version of a piece of code that I am working on: %macro test(var); %if &var = 'Sub Prime' %then %do; %let var2 = 'Sub_Prime'; %put &var2; %end; %mend; %test(Sub%str( )Prime); Basically the point of this is that if var = 'Sub Prime' that var2 should...

How to change the column headers of a sas dataset into an observation?

I have created a sas code which generates many sas datasets. Now I want to append all of them to a single excel file . So first I want to convert all the column headers of sas datasets as first observation. Then leave space between these datasets (adding a blank...

merging all columns in sas dataset who has column “shiyas” in header

I have a sas dataset with columns shiyas1,shiyas2,shiyas3 in it. That dataset has some other columns also. I want to combine all the columns with header with shiyas in it. We can't use cats(shiyas1,shiyas2,shiyas3) because similar datasets have columns upto shiyas10. As I am generating general sas code, we cannot...

SAS: How can I filter for (multiple) entries which are closest to the last day of month (for each month)

I have a large Dataset and want to filter it for all rows with date entry closest to the last day of the month, for each month. So there could be multiple entries for the day closest to the last day of month. So for instance: original Dataset date price...