Number
Frequencies and Round Numbers (Basic_NumberFreq_Round)
A. The first part of
this program prepares a table of the number frequencies showing:
1.
The rank (1, 2, 3, 4, …).,
2.
The amount, and
3.
The frequency.
B. The second part of
this program calculates the proportions of numbers that are round numbers.
Round numbers are defined to be numbers that are divisible by the
following without leaving a remainder:
1.
5,
2.
25,
3.
100, and
4.
1,000.
For
each of the multiple numbers the output table shows:
1.
The actual count of numbers divisible by 10, 25, 100 and 1,000,
2.
The actual proportions,
3.
The expected proportions,
4.
The number of observations greater than or equal to 1.00.
The round
numbers part of the program is only meant for numbers that are integers
(not dollars and cents). Consequently:
The Round Number statistics are based on the integer portion of the
number only
- all digits to the right of the decimal point are deleted before
the program checks whether the number is round or not.
- numbers such as 450.02 and 50.50 will be analyzed as if they were
450 and 50
- both 63,400.21 and 505,000 would be multiples of 100
- 63.40 would not be a multiple of 5
- 50.05 would be a multiple of 25
The
Round Number Statistics are only calculated for those numbers that are
1.00 or larger (Obs >= 1.00).
Your
data can include number values under 1.00 these numbers will be
automatically deleted during the round number calculations
The
Number Frequencies are calculated for all numbers, i.e., negative numbers
and zeros can be included in the data set
Last-Two
Digits (Basic_Last2_Digits)
This
program does the calculations and prepares a graph and table for the
last-two digits test. The
last-two digits test is only meant for numbers that are integers (not
dollars and cents). Consequently:
The
Last-two digits are based on the integer portion of the number only for
numbers 10.00 and higher. The
last-two digits are the tens and units digits.
The graph
includes upper and lower confidence bounds showing the limits for
differences at the 5 percent level of significance.
The expected proportion for all the last-two digit combinations (00
to 99) is 0.01 (or 1 percent).
The
table follows the same format as the tables in the Basic_ProfileFirSecFir2
output.
Note:
The
Basic Tests are all tests that are run against a single column of numbers.
Nothing other than the column of numbers (a single field is
considered).
ADVANCED TESTS
(NINE PROGRAMS)
The
Advanced Tests analyze (1) a field of numeric data, and (2) one or more
fields representing subset variables.
These programs have shown themselves to be powerful error-detecting
technologies.
Profile
Details (Subset_Profile_Details)
This
program calculates subset details for each of the data profile categories
of the Data Profile program. The
subsets used are usually vendors but could also be, for example, dates,
zip codes, or departments. The
amounts for each subset are summed. The
output tables show:
1a. The largest subsets in the
over $10 category by amount (e.g., total dollars).
1b. The largest subsets above
can be resorted by frequency (e.g., number of invoices).
2. The largest subsets
in the 0.01 to $9.99 category by amount (can be resorted by frequency).
3. The largest subsets
for the zero dollars category by frequency.
4. The largest subsets
in the -0.01 to -$9.99 category by amount (can be resorted by frequency).
5a. The largest subsets in the
under -$10 category by amount.
5b. The largest subsets above
can be resorted by frequency (e.g., number of invoices).
6. The largest subsets
in the 0.01 to 50.00 category by amount (can be resorted by frequency).
The
table shows the largest subsets for (1) to (6) and is used for refunds,
reimbursements, or disbursements.
The
program might give a macro error message during execution if one of the
categories has a very few entries. This
is because the program is told to copy the formula down
x number of rows, but x is
zero which means that no copying can be done.
Click on Continue and the program will continue without error.
Number
Frequency Factor (Subset_Same_Numbers)
This
program lists the subsets for which all the number amounts are all equal.
An example would be a landlord that was paid 12 rent payments that
were all equal amounts. The
program prepares a table showing:
1.
The subset reference,
2.
The number of observations,
3.
The number of observations that are the same (should be the same as
(2) above).
4.
The number that is repeated, and
5.
The total of the number amounts for that subset.
The
output is sorted by number of observations (decreasing).
The output can be resorted by another criteria such as the Subset
number total.
This
program only considers number amounts of 0.01 or larger and subsets with
more than one entry. The
input can include numbers under 0.01 and subsets with only one entry.
These entries will be automatically deleted during processing.
Number
Duplication Two (Subset_Numbers_Occur_Twice)
This
program lists subsets that had a number amount occur twice (and twice
only). If a subset had two
(or more) amounts occur twice, the subset will be listed multiple times.
A subset with 63.40, 63.40, 110,364, 2,204, and 2,204 would be
listed twice (once for 63.40 and once for 2,204).
This is a basic test for duplicate payments or errors in data files
where entries are duplicated after a file is created from multiple
sources. The output table
shows:
1.
The subset reference,
2.
The amount that occurred exactly twice, and
3.
The count for the amount (which is 2 in every
case)
This program reports the Subsets where a number occurred twice and
twice only where the items are zero or larger.
Your data can include field values under zero and subsets with only
one entry, these items/subsets will be automatically deleted during
processing.