Data Analysis Technology for the Audit Community

 

DATAS For Excel

  3 of  5  

Number Frequencies and Round Numbers (Basic_NumberFreq_Round)

A.         The first part of this program prepares a table of the number frequencies showing:

1.       The rank (1, 2, 3, 4, …).,

2.       The amount, and

3.       The frequency.

B.         The second part of this program calculates the proportions of numbers that are round numbers.  Round numbers are defined to be numbers that are divisible by the following without leaving a remainder:

1.       5,

2.       25,

3.       100, and

4.       1,000.

            For each of the multiple numbers the output table shows:

1.       The actual count of numbers divisible by 10, 25, 100 and 1,000,

2.       The actual proportions,

3.       The expected proportions,

4.       The number of observations greater than or equal to 1.00.

The round numbers part of the program is only meant for numbers that are integers (not dollars and cents).  Consequently:

The Round Number statistics are based on the integer portion of the number only

- all digits to the right of the decimal point are deleted before the program checks whether the number is round or not.

- numbers such as 450.02 and 50.50 will be analyzed as if they were 450 and 50

- both 63,400.21 and 505,000 would be multiples of 100

- 63.40 would not be a multiple of 5

- 50.05 would be a multiple of 25

The Round Number Statistics are only calculated for those numbers that are 1.00 or larger (Obs >= 1.00).

Your data can include number values under 1.00 these numbers will be automatically deleted during the round number calculations

The Number Frequencies are calculated for all numbers, i.e., negative numbers and zeros can be included in the data set

Last-Two Digits (Basic_Last2_Digits)

            This program does the calculations and prepares a graph and table for the last-two digits test.  The last-two digits test is only meant for numbers that are integers (not dollars and cents).  Consequently:

The Last-two digits are based on the integer portion of the number only for numbers 10.00 and higher.  The last-two digits are the tens and units digits.

The graph includes upper and lower confidence bounds showing the limits for differences at the 5 percent level of significance.  The expected proportion for all the last-two digit combinations (00 to 99) is 0.01 (or 1 percent).

The table follows the same format as the tables in the Basic_ProfileFirSecFir2 output.

Note:

            The Basic Tests are all tests that are run against a single column of numbers.  Nothing other than the column of numbers (a single field is considered).

ADVANCED TESTS (NINE PROGRAMS)

The Advanced Tests analyze (1) a field of numeric data, and (2) one or more fields representing subset variables.  These programs have shown themselves to be powerful error-detecting technologies.

Profile Details (Subset_Profile_Details)

This program calculates subset details for each of the data profile categories of the Data Profile program.  The subsets used are usually vendors but could also be, for example, dates, zip codes, or departments.  The amounts for each subset are summed.  The output tables show:

1a.        The largest subsets in the over $10 category by amount (e.g., total dollars).

1b.        The largest subsets above can be resorted by frequency (e.g., number of invoices).

2.         The largest subsets in the 0.01 to $9.99 category by amount (can be resorted by frequency).

3.         The largest subsets for the zero dollars category by frequency.

4.         The largest subsets in the -0.01 to -$9.99 category by amount (can be resorted by frequency).

5a.        The largest subsets in the under -$10 category by amount.

5b.        The largest subsets above can be resorted by frequency (e.g., number of invoices).

6.         The largest subsets in the 0.01 to 50.00 category by amount (can be resorted by frequency).

The table shows the largest subsets for (1) to (6) and is used for refunds, reimbursements, or disbursements.

            The program might give a macro error message during execution if one of the categories has a very few entries.  This is because the program is told to copy the formula down x number of rows, but x is zero which means that no copying can be done.  Click on Continue and the program will continue without error.

Number Frequency Factor (Subset_Same_Numbers)

This program lists the subsets for which all the number amounts are all equal.  An example would be a landlord that was paid 12 rent payments that were all equal amounts.  The program prepares a table showing:

1.       The subset reference,

2.       The number of observations,

3.       The number of observations that are the same (should be the same as (2) above).

4.       The number that is repeated, and

5.       The total of the number amounts for that subset.

            The output is sorted by number of observations (decreasing).  The output can be resorted by another criteria such as the Subset number total.

            This program only considers number amounts of 0.01 or larger and subsets with more than one entry.  The input can include numbers under 0.01 and subsets with only one entry.  These entries will be automatically deleted during processing.  

Number Duplication Two (Subset_Numbers_Occur_Twice)

This program lists subsets that had a number amount occur twice (and twice only).  If a subset had two (or more) amounts occur twice, the subset will be listed multiple times.  A subset with 63.40, 63.40, 110,364, 2,204, and 2,204 would be listed twice (once for 63.40 and once for 2,204).  This is a basic test for duplicate payments or errors in data files where entries are duplicated after a file is created from multiple sources.  The output table shows:

1.       The subset reference,

2.       The amount that occurred exactly twice, and

3.       The count for the amount (which is 2 in every case)

       This program reports the Subsets where a number occurred twice and twice only where the items are zero or larger.  Your data can include field values under zero and subsets with only one entry, these items/subsets will be automatically deleted during processing.

  3 of  5 

TOP

 

 

Mark J. Nigrini Ph.D.

55 Heath Court, Pennington, New Jersey, 08534

Tel: (609) 303-0533  E-mail: mark_nigrini at msn dot com