Relative
Size Factor (Subset_Relative_Size)
This
program calculates the Relative Size Factor (largest number divided by
second largest number) for each subset.
The test is useful for detecting errors in the number amounts in a
subset. The output table
shows the:
1.
Subset reference,
2.
Largest number for the subset,
3.
Second largest number for the subset,
4.
The Relative Size Factor (calculated as (2) divided by (3)), and
5.
Total number of items for the subset.
The RSF is only calculated for subsets with more than one entry and
only takes into account numbers that are 1.00 and larger.
Your data can include field values under 1.00 and subsets with only
one entry, these items/subsets will be automatically deleted during
processing.
Round
Number Subsets (Subset_Round_Numbers)
This
program checks subsets for abnormal duplications of round numbers
(divisible by 100 (one output) or divisible by 1,000 (another output).
The outputs show the following details for each subset:
1.
Subset reference
2.
The total of the numbers for the subset.
3.
The Z-statistic which measures the significance of the difference
between the actual proportion of round numbers and 0.01 (for the multiples
of 100) and 0.001 (for the multiples of 1,000).
4.
The count of all the numbers in the subset
5.
The count of the round numbers
6.
The proportion of the numbers that are round ((5) divided by (4))
The
Round Number Subset statistics are only calculated for subsets with one or
more items where the items are 1.00 or larger.
Your
data can include field values under 1.00 these items/subsets will be
automatically deleted during processing.
The
Round Number statistics are based on the integer portion of the number
only
-
all digits to the right of the decimal point are deleted before the
program checks
whether the number is round or not.
-
numbers such as 4,500.20 and 505.05 will be analyzed as if they were 4,500
and 505
-
both 63,400.21 and 505,000 would be multiples of 100
-
634.00 would not be a multiple of 100
-
500.05 would be a multiple of 100
The next
group of Advanced tests use more than one subset variable.
These are all especially powerful error-detecting technologies.
Two
Subsets (Subset_SameSameDiff)
This
program finds instances where two subsets have the same entries.
The test is a powerful test for errors, and could, for example find
observations where:
the
invoice numbers are the same,
the
dollar amounts are the same, and
the
subsets (vendor #s) are different.
Another
test could be the same date, the
same dollar amount, and different
vendor numbers. The output
shows each match on two lines (more if needed) and the details shown are:
1.
The first field found to be the same,
2.
The second field found to be the same (a numeric field), and
3.
The subset reference.
A
“Yes” in the Duplicates column indicates that there were two identical
entries for that observation.
Three
Subsets (Subset_SameSameSameDiff)
This
program is an extension of the above and will detect instances where:
a numeric
field is the same, (e.g., dollar
amount)
another
field is the same, (e.g.,
invoice number)
another
field is the same, and (e.g.,
invoice date)
another
character field is different.
(e.g., vendor number)
This test is very useful for very large data files where Two Subsets
yields thousands of matches. The
Output format is the same as for the Same-Same-Different program.
Three
Alike (Subset_SameSameSame)
This program will detect matches where:
a field
is the same,
another
field is the same, and
another
field is the same.
This
test is useful for finding unusual matches in accounts payable, employee
reimbursement, inventory, and payroll files.
Four
Alike (Subset_SameSameSameSame)
This
program is an extension of the above and will detect instances where:
four
fields are the same for two
different observations.
This
test is very useful for finding unusual matches in accounts payable,
employee reimbursement, refund, sales data, inventory, and payroll files.