|
| |
Using Microsoft Access for Data Analysis and Interrogation
|
|
|

|
USING
MICROSOFT ACCESS FOR DATA ANALYSIS & INTERROGATION
|
The use of Benford’s Law, number patterns, ratios,
and duplications to detect
errors,
biases, fraud, irregularities, and inefficiencies in corporate
data
|
Mark J.
Nigrini Ph.D.
The data analysis tests described in this
book focus on finding anomalies in data sets. These anomalies will usually be evidenced by abnormal
duplications of record fields or especially high or low ratios between
related records. The
objective of finding these anomalies is to detect:
 |
Possible fraud |
 |
Potential errors |
 |
Potential outliers (values that are abnormally high or low
compared to average) |
 |
Biases in the data (numbers skewed to a value or range of values) |
 |
Possible processing inefficiencies (excessive repeats of
low-value transactions) |
 |
A change in the composition of the numbers between time periods
(Continuous Monitoring) |
The data analysis tests use mainly exploratory methods that look
for anomalous patterns, differences, matches, and anomalies in
data. Other types of data
analysis tests are data mining or rule discovery. Data mining generally looks for patterns and relationships among
data items using correlation tests. Correlation tests are not described in this book but they should
be quite easy to do after working through the material in the book.
Data analysis is also called statistics. Statistics deals with the collection, analysis, interpretation,
and presentation of large volumes of data. Very
few of the tests in this book can be described as statistics.
Chapter 2 discusses
Benford’s Law which gives the expected frequencies of the digits in
data and is named after Frank Benford, the physicist who published the
seminal paper in 1938. Contrary
to intuition, the digits are not all equally likely and show a biased
skewness in favor of the lower digits. Benford noted that the first few pages of a log table book show
more wear than the last pages. He
hypothesized that this was because most of the “used” numbers had a
low first digit. The common
feature of the first few pages of the tables is that they show the logs
of numbers with low first digits (1 and 2). He then analyzed the first digits of 20 lists of widely different
types of data sets with a
total of 20,229 numbers. Benford’s
results showed that 31 percent of the numbers had a first digit 1,
18 percent had a first digit 2, and only 5 percent had a first
digit 9. He saw a
pattern to his results and, using calculus, he then derived the expected
frequencies of the digits for tabulated “natural” data. The expected frequencies are shown in Table 2.1 and the formulas
are also shown in Chapter 2. The
book gives guidance in testing data sets for conformity to Benford’s
Law and shows how the tests can be done in Access in Chapters 5 and 7.
With regard to data analysis using Access, the main features
supporting data analysis through queries are:
 |
The ability to create a query using the wizards, design view, or
SQL view |
 |
The ability to query tables, prior queries, or both tables and
queries |
 |
The ability to use SQL view to modify a query created in Design
view |
 |
Performance Analyzer (Tools, Analyze, Performance) to assist in
making queries more efficient |
 |
The ability to format the output of the query |
 |
The ability to sort and resort query results without creating
multiple dynasets along the way |
 |
Built in functions for calculated fields including Iif (Immediate
If) function and full set of operators such as less than or greater than
and also And, Or, and Not |
 |
Tolerance of empty (null) fields |
 |
The ability to read all the main file formats, with its ODBC
capabilities well suited to an audit environment |
 |
The ability to easily join Tables allowing queries that are based
on conditions in both or either tables. |
 |
The ability to seamlessly export results to Excel for further
analysis or neat presentation |
 |
Excellent support available on the Microsoft website |
In the book the user is
shown the detailed steps needed (with screen shots) to run the tests.
The chapters discuss one or more tests, give notes and hints
on using the tests, include a challenge to power users, and show the
detailed steps in Access to run the tests. The introductory chapters set the stage for using Access in data
analysis and discuss the logic and architecture of Access. The book includes two test data sets that will allow the user to
duplicate the screen shots and Excel templates that produce graphs that
match the format of the graphs in the book.
Ordering
Details
Price
is $70 for copies mailed to USA and Canada.
Send
e-mail to author at mark_nigrini@msn.com
Payment
in advance can be made by credit card by logging on to www.paypal.com
and registering and then sending me $70 to my e-mail address.
Alternatively a check can be mailed to the address on my home page. |
PREFACE AND
ACKNOWLEDGEMENTS
We live in exciting times. Words
such as data warehouse, data mining, ODBC, gigabytes,
and servers get used widely at conferences. Desktop and notebook computers have progressed to processing
speeds above 2 gigahertz and 1 gigahertz respectively. Never before has data been so accessible with such ease to data
analysts and never before has such computing power been available so
cost effectively. Add the
missing ingredient – data analysis software in the form of Microsoft
Access – and we have the recipe to accomplish great things.
In professional circles the
Institute of Internal Auditors has redefined internal auditing to
include more of a consulting role than was previously the case. For external auditors the Panel on Audit Effectiveness suggested
that auditors develop new approaches to auditing, including some form of
continuous auditing, with greater emphasis on the use of
technology-driven analytical and diagnostic procedures. This and the high volume of questions posted on Microsoft’s
newsgroup sites, the breadth of the questions, and the quality of the
answers all point to the fact that many people are currently involved
with data collection and data management and that still more will be
involved in data analysis in the years to come.
My foray into the world of data analysis started in 1989 with a
half page discussion of Benford’s Law in a Statistical Decision Theory
book. Fortunately the author of the book, James Berger, gave the
full reference to Benford’s paper. I found the idea that there were predictable patterns to the
digits in tabulated data to be very intriguing. As a Ph.D. student at the University of Cincinnati my thoughts
were that anything intriguing had the potential to be a dissertation
topic and/or a published paper. Soon
after reading about Benford’s Law I went to the library and eagerly
copied the 1938 paper written by Frank Benford. I read the article over and over again in a mild disbelief.
Benford set out in precise terms the expected single digit and
digit combination frequencies for lists of data. These expected frequencies are now known as Benford’s Law.
I recorded my thoughts in a
paper titled The Frequency Distribution of Digits and its Application
to Accounting Research dated July, 1989. With hindsight the fact that I saw many varied applications was
good,
the fact that I was advocating only one test was not good. After much work I completed my dissertation in 1992 titled
The
Detection of Income Tax Evasion through an Analysis of Digital
Distributions. The
taxpayer tax return data set that I used had 100,000 records and the
thought of analyzing such a large data set using anything other than a
mainframe computer, SAS, and JCL never even occurred to me. One thing that is obvious with hindsight, is that in the course
of writing my dissertation I developed some additional data analysis
routines to Benford’s Law.
In August, 1993 I accepted a position at Saint Mary’s
University in Halifax, Nova Scotia. I was convinced that Benford’s Law on its own was a powerful
fraud detection technique. My
breakthrough came a few years later when on July 10, 1995 The Wall
Street Journal published a story titled He’s got their number:
Scholar uses math to foil financial fraud. The He was me.
Since that time I’ve published papers on Digital
Analysis in academic journals such as The
Journal of the American Taxation Association, Auditing:
A Journal of Practice and Theory, and
The Journal of Accounting Education,
and in practitioner journals such as Internal Auditor, Journal of
Accountancy, IT Audit Forum, The White Paper, and
IEEE Potentials. My
first book Digital Analysis Using
Benford’s Law: Tests and Statistics for Auditors is published by
Global Audit Publications. Besides The Wall Street Journal my Benford’s Law work has been
discussed in Canadian Business, Technology in Government,
Contingencies,
The Globe and Mail, CA Magazine, CFO Magazine, The
Financial Times, The New York Times, Der Spiegel, Business
Week, Dallas Morning News, Philadelphia Inquirer, The Journal of
Accountancy, and USA Today. The papers and the interviews all have two threads in common, (1)
they focus on Benford’s Law, and (2) they require any user to write
their own data analysis routines.
Over the course of the past two
years I’ve developed this book which combines (1) data analysis tests
including Benford’s Law, and (2) Microsoft Access. What we have now is a book that describes sixteen data analysis
tests, and how to run these tests in Microsoft Access. In general, books on
data analysis are books on statistics. The basic building blocks of statistics are probability
distributions, hypothesis testing, correlation, regression, analysis of
variance, and time series analysis. The tests described in this book are not
statistics,
but instead question the data, looking for oddities and anomalies.
These tests aim for high level overviews and then to run the
records through the tests looking for things such as signs of procedural
inefficiencies, red flags for fraud, and red flags for errors. Most of the tests are tests derived from the tenet of Benford’s
Law which is to identify abnormal duplications in data. The data analysis tests in Chapter 13 (Histogram, Duplicates,
Gaps, and Aging data) were included because they have stood the test of
time in auditing circles.
This book should be of use to
data analysts whether your job title or function includes the words
analyst or auditor. I have
been surprised by the lack of coverage of data analysis in the
off-the-shelf books on Access. This
is probably because of the volume of material that there is to cover in
a database manual covering, inter alia, data entry, forms,
reports, macros, modules, tables and importing/exporting of data. This book covers data analysis tests and queries in depth.
In general, the data analysis routines require the use of one or
more of the following:
 |
Calculating sums and counts using the
Where
criteria in Access
|
 |
Creating calculated fields to create
fields that are calculated using data in other fields
|
 |
Using built-in functions for complex
calculations
|
 |
Grouping records and then getting the
Count, or Sum, or Maximum value for the groups
|
 |
Identifying duplicate records usually
with the Find Duplicates Query Wizard and non-matches with the Find
Unmatched Query Wizard.
|
 |
Using a Join to query two or
more tables for matching records or non-matching records.
|
This book uses most of Access’
capabilities including a Make-Table query, creating a running sum
(cumulative total), creating a table from scratch, identifying items in
a sequence that are missing from a sequence. The conventional use of tables makes finding what is missing
particularly challenging. We
import data using File Import, we export output to Excel, we show how to
create a UNION query, we go into SQL view and tweak it to meet our
needs. We even turn a Join
operation inside out by using Is Null.
There are three main parts to this book.
First, we use three chapters to review the theory and practical
issues associated with data analysis and the architecture of Access.
Chapter 4 is an overview of some basic data analysis routines
using Access. Second,
chapters 5 through 13 discuss the tests in those chapters with notes and
tips for users. The notes
and tips cover my experiences and feedback from other users. Third, chapters 5 through 13 show how to do these tests in
Access. The book includes a
test data set and the Access screen shots show step-by-step how to do
the tests and what your output should look like if you follow the steps
using the test data set.
The chapters describing data analysis tests have a Challenge to
Power Users. Here I briefly
describe the logic used to get Access to generate the desired results.
Power users can read about the tests and can then simply follow
the logic on their data sets without wading through the remaining pages.
Other users can use these pages as a summary of the Access logic
which should make following the steps a little easier. Some of these sections include a question or challenge for the
power users. Feedback is
welcomed. I’ve also
included by way of appendices some resources on the web, including the
web address for this book (extra notes, updates etc.) and a glossary of
selected Access terms related to data analysis. Thanks to Microsoft for permission to use this material from
their website.
This book took two years to complete.
Thanks to my Computer Modeling students at SMU for working
diligently on their Access projects. Some of the innovative ideas of Eduardo Carlo, Karen Perry, and
Mark Wood are included in this book. The first year was basically spent confirming that Access could
in fact do all the tests known as Digital Analysis and could also do the
Chapter 12 (audit sampling) and Chapter 13 (other miscellaneous) tests
for both Access 97 and Access 2000. The second year was spent working on the introductory chapters,
writing the summary of the tests, tips and notes for users, and all the
ancillary material.
My thanks to Mark Irving
of
Allete in Duluth, MN for working through the drafts of some of the
chapters. Thanks too to
Will Yancy for his review of the audit sampling chapter. I’d also like to thank Duane Hookom, and A. Nijborg for their
tips that I got from the Microsoft Access Newsgroups site. A large measure of thanks also goes to the management and
auditors at American Airlines for integrating my data analysis tests
into a comprehensive Data Analysis/Access course presented at AMR
headquarters. Steve Proesel has been instrumental over many years in
advocating the use of data analysis and up-to-date-software for all
aspects of data interrogation at American. Steven Bemus spent time trying to figure out easier solutions for
some of the Access routines and allowed me to “borrow” some of his
thoughts for the introductory material. A special thanks goes to Susan Oldfield for her ideas and for
telling me about Snag-It which was used for the screen shots.
I’d be interested in your feedback for future editions of this
book and also for updates for the website www.nigrini.com.
Good luck with your venture into the world of data analysis and
data interrogation.
Mark J. Nigrini Ph.D., Dallas, Texas.
January 11, 2002.
|
|
USING MICROSOFT
ACCESS FOR DATA ANALYSIS AND INTERROGATION
|
|
|
|
|
|
1
|
INTRODUCTION ………………….………………………………………
|
|
1
|
|
|
History of data analysis
|
|
2
|
|
|
Data analysis in auditing
|
|
3
|
|
|
The layout of this book
|
|
4
|
|
|
|
|
|
|
2
|
DATA
ANALYSIS THEORY AND PRACTICAL ISSUES
…………….
|
|
5
|
|
|
Benford’s Law
|
|
6
|
|
|
Relationship to logs of numbers
|
|
11
|
|
|
Applications of Benford’s Law
|
|
14
|
|
|
Which data sets should conform to Benford’s Law?
|
|
15
|
|
|
The effect of data set size
|
|
17
|
|
|
Practical considerations for data analysis
|
|
18
|
|
|
Obtaining data in a corporate environment
|
|
19
|
|
|
Summary
|
|
20
|
|
|
|
|
|
|
3
|
THE
BASICS OF ACCESS ……………….…………………………..…..
|
|
21
|
|
|
The architecture of Access
|
|
22
|
|
|
Notes on Access tables
|
|
23
|
|
|
Required properties of database tables
|
|
23
|
|
|
General tips on reading data into Access
|
|
25
|
|
|
Queries
|
|
30
|
|
|
Miscellaneous Access notes
|
|
33
|
|
|
|
|
|
|
4
|
AN
OVERVIEW OF DATA IMPORT, QUERIES AND CALCULATED FIELDS
|
|
35
|
|
|
Loading a data set into Access
|
|
36
|
|
|
Basic summary statistics using built-in query wizard
|
|
48
|
|
|
Selecting the high value records
|
|
54
|
|
|
Creating a calculated field
|
|
61
|
|
|
Grouping and counting records
|
|
64
|
|
|
|
|
|
|
5
|
DATA
PROFILE AND BASIC DIGIT TESTS
………………………..…
|
|
69
|
|
|
Data profile
|
|
70
|
|
|
Data profile: Tips and notes
|
|
71
|
|
|
Basic digit tests
|
|
72
|
|
|
Basic digit tests: Notes and tips
|
|
73
|
|
|
Challenge to power users
|
|
75
|
|
|
Importing and reading the accounts payable data set
|
|
76
|
|
|
Creating the data profile
|
|
83
|
|
|
Exporting the Access results to Excel
|
|
87
|
|
|
Performing the basic digit tests
|
|
91
|
|
|
Graphing the basic digit tests
|
|
94
|
|
|
|
|
|
|
6
|
NUMBER
DUPLICATION ……………………………………………….
|
|
99
|
|
|
Notes on the number duplication test
|
|
100
|
|
|
Challenge to power users
|
|
101
|
|
|
Running the number duplication test
|
|
102
|
|
|
|
|
|
|
7
|
LAST-TWO DIGITS AND ROUND NUMBERS ………………….……
|
|
105
|
|
|
Notes on the last-two digits and round numbers tests
|
|
106
|
|
|
Challenge to power users
|
|
107
|
|
|
Running the last-two digits test
|
|
108
|
|
|
Running the round numbers test
|
|
115
|
|
|
|
|
|
|
8
|
LARGEST
SUBSETS AND ALL NUMBERS EQUAL
……………..….
|
|
119
|
|
|
Notes on the largest subsets test
|
|
120
|
|
|
Notes on the all numbers equal test
|
|
122
|
|
|
Challenge to power users
|
|
123
|
|
|
Running the largest subsets test
|
|
124
|
|
|
Running the all numbers equal test
|
|
126
|
|
|
|
|
|
|
9
|
NUMBERS
OCCUR TWICE AND ROUND NUMBERS
…….………..
|
|
135
|
|
|
Notes on the numbers occur twice test
|
|
135
|
|
|
Notes on the round number subsets test
|
|
136
|
|
|
Challenge to power users
|
|
137
|
|
|
Running the numbers occur twice test
|
|
138
|
|
|
Running the round numbers subset test
|
|
140
|
|
|
|
|
|
|
10
|
RELATIVE
SIZE FACTOR ………………………………………………
|
|
143
|
|
|
Notes on the relative size factor test
|
|
144
|
|
|
Challenge to power users
|
|
145
|
|
|
Running the relative size factor test
|
|
146
|
|
|
|
|
|
|
11
|
DUPLICATE
RECORDS AND SAME-SAME-DIFFERENT TESTS
|
|
157
|
|
|
Notes on the duplicate records test
|
|
157
|
|
|
Notes on the same-same-different test
|
|
158
|
|
|
Challenge to power users
|
|
159
|
|
|
Running the duplicate records test
|
|
160
|
|
|
Running the same-same-different test
|
|
164
|
|
|
|
|
|
|
12
|
AUDIT
SAMPLING ……………………………………………………….
|
|
173
|
|
|
Difference estimation using hypothesis testing
|
|
174
|
|
|
Monetary unit sampling
|
|
177
|
|
|
Challenge to power users
|
|
178
|
|
|
Extracting a random sample
|
|
179
|
|
|
Extracting a MUS sample
|
|
182
|
|
|
Special notes on the techniques used in this chapter
|
|
194
|
|
|
|
|
|
|
13
|
HISTOGRAM,
DUPLICATES, GAPS, AND AGING
……….……..…..
|
|
195
|
|
|
Creating a data set in Access
|
|
196
|
|
|
Challenge to power users
|
|
187
|
|
|
Preparing a histogram using a union query
|
|
198
|
|
|
Preparing a histogram using linked queries
|
|
203
|
|
|
Identifying duplicates in data
|
|
208
|
|
|
Identifying gaps in data
|
|
211
|
|
|
Aging data
|
|
223
|
|
|
|
|
|
|
14
|
REVIEW
AND CONCLUSIONS …………………………………………
|
|
225
|
|
|
Using Access for data analysis
|
|
226
|
|
|
The format of data analysis queries in Access
|
|
226
|
|
|
A summary of the data analysis tests
|
|
227
|
|
|
Continuous monitoring
|
|
229
|
|
|
Future data analysis tests
|
|
229
|
|
|
Closing thoughts
|
|
230
|
|
|
|
|
|
|
|
APPENDIX
A: Resources on the Internet
|
|
231
|
|
|
|
|
|
|
|
APPENDIX
B: Bibliography
|
|
233
|
|
|
|
|
|
|
|
APPENDIX
C: Glossary of Selected Access 2002 Terms
|
|
235
|
|
|
|
|
|
|
|
Index
|
|
243
|
|
Here are the two data files needed to duplicate the work in the
book and three Excel files to be used to produce a neat data
profile, basic digit tests, and last-two digits graph.
|
|
|
In Appendix A the site http://www.athree.com
is shown as a forum for posting of questions. This
site was closed in April, 2002. If you go to this site, you will
be redirected to the successor site (utteraccess.com).
|
| Tenet
Healthcare, Medicare Audit.
Tenet Healthcare is currently the
focus of a Medicare audit. The extract from the article below
notes that the Redding Hospital had a heart surgery rate that was the
highest in California, and twice the national average.
Audit
of suspected Medicare billing fraud
In the Data Analysis Using Access book
(described above) the test shown in Chapter 8 (Largest Subsets) was,
amongst others, designed to detect instances where a fraud occurs to
such an extent that the subset (vendor, employee, hospital, location,
frequent flyer account etc.) total was so high that a listing of the
subset totals has them ranked near the top. In many cases
fraudsters "just don't know when to stop." |
|
| Recommended
books on Microsoft Access
These are the books in my Access
library that you might consider adding to your library too:
February 19, 2006
update:
I have just spent 30
minutes randomly leafing through my newest book on Microsoft Access and
my first impression is that this book is excellent. The
book is Microsoft Access Data Analysis written by Michael
Alexander and published by Wiley. In the first 15 minutes alone I
saw (1) a way to add the Rank to the output of a query (I could this
before using a Make Table query and adding an ID field but this book's
method is better), (2) getting a frequency distribution using one query
(I could do this but my method using a linked query takes longer), and
(3) parsing strings in a field separated by commas (this is exactly what
the output looks like from a ERP system and I could do this using
multiple Append queries, but again this book's method is better).
The book also actually talks about things like the Domain Aggregate
Functions and the very valuable Switch function. This book is a
must have for anyone that spends time using Access queries. The
book could be called "Queries 'R Us."
My second choice for
your library:
"Running Microsoft Access 2000" by
John Viescas, published by Microsoft Press (about $45). This is a
good comprehensive A-Z reference on Access. See http://www.viescas.com/Info/books.htm
"Building Microsoft Access
Applications" by John Viescas.
Optional extras:
"Access 2002 for Dummies" by
John Kaufeld (about $22). This is a nicely written book that would
be good for someone about to take a seminar or about to start using at
work - a good first book. Book can be found in most bookstores and
on amazon.
"Access 2002 MOUS Expert
Level" by Floyd Winters and Julie Manchester, published by Prentice
Hall (about $50). This book covers some advanced material and does
so very well. You will need Access 2002 to be able to use the
files on the CD that comes with the book. See, http://vig.prenhall.com/catalog/academic/product/0,4096,0130497851,00.html
"Access 2000 Essentials
Intermediate" by Robert Ferrett, Sally Preston, and John Preston
(about $20). This book talks about when and why you would want to
use some Access objects. Good sections on forms, reports, and
macros. See, http://vig.prenhall.com/catalog/academic/product/0,4096,1580763014,00.html
"Access 2000 Essentials
Advanced" see above. Book is well suited to someone that has
a project up and running and now wants some fine tuning. See, http://vig.prenhall.com/catalog/academic/product/0,4096,1580763022,00.html
"Database design for mere
mortals" by Michael Hernandez (about $50). This book covers
all the design issues in enough detail for an auditor that needs to
audit a database application of for someone embarking on a first project
of low complexity. See http://www.formeremortals.com/
|
| Recommended
websites for users of Microsoft Access
http://www.rogersaccesslibrary.com
(site includes a compendium of over 100
Access routines that show a solution to an Access problem that a user
might be faced with. Site also includes links to other Access
libraries, link to Access webring, and articles (called papers) on
miscellaneous Access topics. Well worth a visit. Roger
Carlson answers e-mails sent to him) |
|
| Microsoft Access web-based seminars |
|
Institute of Internal
Auditors Audit Learning: This
web-based training course will review Benford's Law and some other data
analysis tests and will show how the tests can be done in Microsoft
Excel and Microsoft Access. CPE hours will be one, two, or three hours, and
the web-based
Benford's Law seminar should be available in early January, 2003.
The IIA's high-quality web-based seminars are usually $40 per CPE hour.
The IIA is a NASBA approved sponsor for CPE credit.
|
| Benford's Law,
Statistics, and Microsoft Access consulting in Germany, see site of Dr.
Marcus Herold. |
| My forthcoming Microsoft Access book's (updated
28 March, 2007): |
|
Using
Microsoft Access 12 for Data Analysis and Interrogation. This
will be an update to my current book (shown above) and will be updated
as follows: (1) screen shots and methods (such as data import) will be
based on Access 12 (due for release in late 2006), (2) book will
include exercises and cases so that it can be used as a college
textbook, (3) book will include Excel macro's and Excel procedures so
that the tests could be run in Excel 12 with it's forthcoming 1,048,576
rows, and (d) book will include additional tests along the lines of my
recent journal publication titled "Monitoring Techniques
Available to the Forensic Accountant" published in the Journal
of Forensic Accounting (December, 2007).
Available
October, 2007. |
|
Continuous Monitoring. This book will show how to build selected
continuous monitoring applications using both Excel 12 and Access 12 that would be of value to Internal Audit
and Forensic Accounting departments. Examples would include a
system to place audit units units on a watch-list after being alerted to
possible irregularities as a result of Continuous Monitoring
activities. Other examples will be drawn from my Continuous
Monitoring seminar and my academic and professional work.
Available
middle of 2008.
|
| ---------------- |
|
TOP |
|
|
Home |
|