Data Analysis Technology for the Audit Community

 

DATAS For Excel

  5 of  5 

NEW SYSTEM AND OTHER TESTS (ONE PROGRAM)

My Law (Other_My_Law)

            This program compares the digit patterns of two data sets.  Examples could be the output before a system (or Y2K) conversion and the output after a system (or Y2K) conversion.  The program could also be used to compare two time periods to find errors even when there is no system change involved.

            The user inputs the before first-two digit proportions in column K starting at K2.  A message in M26 will signal a warning if the before proportions do not sum to 1.00.

            The after data is entered in column F.

The program will produce the following output:

1.       The after graph minus the before graph (this shows the differences between the two graphs),

2.       A detailed table of the digit frequencies.

            The after data set may only have numbers that are 1.00 and larger.  If numbers below 1.00 are included the program will fail and many #NUM errors will be displayed.

DATASâ 2000 FOR EXCEL: DESCRIPTION OF TABLES

            The programs that produce graphs:

            Basic_ProfileFirSecFir2

            Basic_First3

            Basic_Last2

            Other_My_Law

            also include accompanying tables.  The format of these tables is the same throughout.  The format of the Basic_ProfileFirSecFir2 tables is described below.

Contents of Tables

The digit programs produce tables and graphs.  The Basic_ProfileFirSecFir2 table is described below with the paragraph letter indicating the column in the spreadsheet.

A.         (Obs#).  Observation number.  The programs include one or more columns for transaction details.  You may include only enough details to trace back to the original transactions in your original data set.

B.         ($).  The data to be analyzed should be entered in this column.  The first observation must be in cell B2 with the remaining observations filling the column downwards.  Do not include blanks or nonnumeric data (e.g., LK6340).  For example, if the data set contains 200 observations the input range should be B2..B201 (or B2:B201).

C.         (TwoDig).  This column converts each observation to a number in the range [10,100), by moving the decimal point.  Each number in column C is therefore greater than (and including) 10 and less than (and excluding) 100.  This calculation is needed to identify the first-two digits.  The formula in this column is copied from C2 downwards.

D.         (First 2).  This column contains the first two-digits of the observations.  It is therefore an integer in the range [10,100).  The formula in this column is copied from D2 downwards.

E.         (First).  This column contains the first digit of the observation which is an integer in the range [1, 2, ...,9].  The formula in this column is copied from E2 downwards.

F.         (Second).  This column contains the second digit of the observation which is an integer in the range [0, 1, ...,9].  The formula in this column is copied from F2 downwards.

G.         (Digit).  This column contains the bin ranges for the frequency counts.  G2..G10 is the bin range for the first digits and G12..G21 is the bin range for the second digits.

H.         (Count).  This column contains the counts of the digits to the left of the count number.  H2..H10 contains the first digit frequencies.  For example, the number "28" in H3 would indicate that there are 28 first digit 2's in the data set.  H12..H21 contains the second digit frequencies.  The number "27" in H13 would indicate that there are 27 second digit 2's in the data set.  The sum of the first and second digit counts (H2..H10 and H12..H21) must equal the number of observations.

I.          (Prop).  This is the proportion that each count represents.  For example, if the data set has 200 observations and H3 had a value of 28, then the proportion would be 0.140 (28/200).

J.          (Ben Law).  This column contains the expected proportions of Benford's Law.  J2..J10 contains the expected first digit proportions and J12..J21 contains the expected second digit proportions.  The proportions in each row are matched horizontally with the digit that it relates to.  For example, J3 contains the expected proportion of 2 as a first digit.

K.         (Diff).  This column shows the difference between the actual proportions (column I) and the expected proportions (column J).  A positive difference indicates that the actual proportion (column I) exceeds the expected proportion (column J).

L.         (Signif).  This column shows the statistical significance of the difference between the two proportions.  The Z-stat in row 2 measures the statistical significance of the first digit 1 difference.  Significance takes into account the size of the difference (over or under), the expected proportion, and the sample size.  Scores above 1.96 are significant at the 0.05 level, and above 2.57 are significant at the 0.01 level.

M.        (FTDigit).  This column is the bin range for the first two-digit frequency counts.  M2..M91 contains the first-two-digits 10 to 99.

N.         (Count).  This column contains the actual counts of the first two-digit combinations to the left of the count number.  For example, the number "27" in N3 would indicate that there are 27 numbers that start with the first two-digit combination of 11.  The sum of the first-two digit counts (N2..N91) should equal the total number of observations in the data set.

O.         (Prop).  This is the proportion that each count represents.  For example, if the data set has 200 observations and N3 equaled 27, then the proportion would be 0.135 (27 / 200).

P.         (Ben Law).  This column contains the expected proportions of Benford's Law matched rowwise with the appropriate first-two digits.  For example, P3 contains the expected proportion of 11 as a first-two digit combination.

Q.         (Diff).  This is the difference between the actual proportions (column O) and the expected proportions (column P).  A positive difference indicates that the actual proportion (in column O) exceeds the expected proportion (in column P).

R.         (Signif).  This column shows the statistical significance of the difference between the two proportions.  The Z-stat in row 2 measures the statistical significance of the first-two digits 10 difference. Significance takes into account the size of the difference, the expected proportion, and the sample size.  Scores above 1.96 are significant at the 0.05 level, and above 2.57 are significant at the 0.01 level.

 

  5 of  5 

TOP

 

 

Mark J. Nigrini Ph.D.
606 Rockcrossing Lane, Allen, Texas 75002
Tel: (972) 359-0020  E-mail: mark_nigrini at msn dot com