BASIC
PROGRAMS
For almost any type of data analysis using a computer,
three main steps are needed. First, the programs need to be
pointed to the data set to be analyzed. Second, the program needs
code or steps to stipulate what separates one field from another.
Third, the program needs to know what field or fields needs to be
analyzed. These steps need to be done in DATAS for SAS and in
addition the user has some options with regards to the format and amount
(number of lines of output) of output.
Since Digitnum.sas is the main program, and is always run with every
analysis, this documentation will show in detail how this program is to
be set up. All the other programs follow the same logic.
Data
Profile, Basic Digits Tests, & Number Duplication
Since these programs are almost always all run when performing Digital
Analysis, DATAS 2000 for SAS has combined the programs. This saves
audit time because only one setup is needed.
The program to perform these tests is called Digitnum.sas. The
program names are all abbreviated in SAS (to eight characters or less)
to allow them to be used without modifying the names for mainframe
usage.
The
first section of Digitnum.sas is as follows:
filename analyze
'c:\Testdata\invoices.csv' ;
%Let countdig=c:\Datas2000_SAS\Output\digitout.csv;
***********************************************;
*Program
Options
;
***********************************************;
%Let numdups=25
;
options
nocenter
;
***********************************************;
*Read Data
File and Print First Page
;
***********************************************;
data one
;
infile analyze
delimiter=',' dsd ;
input dept
ledger date $ field subset $;
keep field
;
run
;
Setup
of Digitnum.sas:
The first line of Digitnum.sas,
filename analyze
'c:\Testdata\invoices.csv' ;
points the user to the data set to be analyzed. The filename word
is the key word that signals that the file will (presumably) be used by
the program. The analyze word is not a SAS term, it is there to
link the filename statement with the appropriate data statement.
The word analyze could be replaced with (say) “apple” in which case
we would use infile apple in the Data one statements.
The second line of Digitnum.sas,
%Let countdig=c:\Datas2000_SAS\Output\digitout.csv;
tells SAS where to write the output file comprising the digit counts.
If the user does not want the output to be written to an Excel-readable
file, this is done by canceling the macro that writes the output (with
an asterisk “*”) as follows,
*%write
(&countdig) ;
Under Program Options there is the statement,
%Let numdups=25
;
This option allows the user to control the print length of the various
output tables. The default length of 25 is really too short for
any application. Users should reset this to a number that is 200
or larger. It is suggested that users run the program against the
test data set to familiarize themselves with the output. Then
decide on an appropriate print length for the data to be analyzed.
The Data One paragraph is the only remaining step requiring user
modifications.
infile analyze
delimiter=',' dsd ;
input dept
ledger date $ field subset $;
keep field
;
The infile statement points to the data set to be analyzed. The
“dsd” word is a word that seems to be required when files are comma
delimited.
The input statement indicates which fields are to be read and where they
are in the file. Digitnum.sas is a Basic Test and consequently
only analyzes a single column (field) of numbers. The field to be
analyzed must be named field. The Keep statement only keeps field.
For comma delimited files the user needs to read in all the fields until
field is reached. If the file is fixed width then the user need
only read in field with a statement of (say),