Data Analysis Technology for the Audit Community

 

DATAS for SAS

   3 of  8  

BASIC PROGRAMS

            For almost any type of data analysis using a computer, three main steps are needed.  First, the programs need to be pointed to the data set to be analyzed.  Second, the program needs code or steps to stipulate what separates one field from another.  Third, the program needs to know what field or fields needs to be analyzed.  These steps need to be done in DATAS for SAS and in addition the user has some options with regards to the format and amount (number of lines of output) of output.

            Since Digitnum.sas is the main program, and is always run with every analysis, this documentation will show in detail how this program is to be set up.  All the other programs follow the same logic.

Data Profile, Basic Digits Tests, & Number Duplication

            Since these programs are almost always all run when performing Digital Analysis, DATAS 2000 for SAS has combined the programs.  This saves audit time because only one setup is needed.

            The program to perform these tests is called Digitnum.sas.  The program names are all abbreviated in SAS (to eight characters or less) to allow them to be used without modifying the names for mainframe usage.

The first section of Digitnum.sas is as follows:

filename analyze 'c:\Testdata\invoices.csv'       ;

%Let countdig=c:\Datas2000_SAS\Output\digitout.csv;

***********************************************;

*Program Options                               ;

***********************************************;

%Let numdups=25                         ;

options nocenter                        ;

***********************************************;

*Read Data File and Print First Page           ;

***********************************************;

data one                                ;

 infile analyze delimiter=',' dsd       ;

 input dept ledger date $ field subset $;

 keep field                             ;

 run                                    ;

Setup of Digitnum.sas:

            The first line of Digitnum.sas,

filename analyze 'c:\Testdata\invoices.csv'       ;

            points the user to the data set to be analyzed.  The filename word is the key word that signals that the file will (presumably) be used by the program.  The analyze word is not a SAS term, it is there to link the filename statement with the appropriate data statement.  The word analyze could be replaced with (say) “apple” in which case we would use infile apple in the Data one statements.

            The second line of Digitnum.sas,

%Let countdig=c:\Datas2000_SAS\Output\digitout.csv;

            tells SAS where to write the output file comprising the digit counts.  If the user does not want the output to be written to an Excel-readable file, this is done by canceling the macro that writes the output (with an asterisk “*”) as follows,

*%write (&countdig)        ;

            Under Program Options there is the statement,

%Let numdups=25                         ;

            This option allows the user to control the print length of the various output tables.  The default length of 25 is really too short for any application.  Users should reset this to a number that is 200 or larger.  It is suggested that users run the program against the test data set to familiarize themselves with the output.  Then decide on an appropriate print length for the data to be analyzed.

            The Data One  paragraph is the only remaining step requiring user modifications.

 infile analyze delimiter=',' dsd       ;

 input dept ledger date $ field subset $;

 keep field                             ;

            The infile statement points to the data set to be analyzed.  The “dsd” word is a word that seems to be required when files are comma delimited.

            The input statement indicates which fields are to be read and where they are in the file.  Digitnum.sas is a Basic Test and consequently only analyzes a single column (field) of numbers.  The field to be analyzed must be named field.  The Keep statement only keeps field.  For comma delimited files the user needs to read in all the fields until field is reached.  If the file is fixed width then the user need only read in field with a statement of (say),

 

   3 of  5  

TOP

 

Mark J. Nigrini Ph.D.

55 Heath Court, Pennington, New Jersey, 08534

Tel: (609) 303-0533  E-mail: mark_nigrini at msn dot com