Data Analysis Technology for the Audit Community

For my Home Page, click here

Using Microsoft Access for Data Analysis and Interrogation

 

Cover

cover.jpg (84466 bytes)

Summary and Ordering

USING MICROSOFT ACCESS FOR DATA ANALYSIS & INTERROGATION

 

The use of Benford’s Law, number patterns, ratios, and duplications to detect errors, biases, fraud, irregularities, and inefficiencies in corporate data

Mark J. Nigrini Ph.D.

The data analysis tests described in this book focus on finding anomalies in data sets.  These anomalies will usually be evidenced by abnormal duplications of record fields or especially high or low ratios between related records.  The objective of finding these anomalies is to detect:

bullet

Possible fraud

bullet

Potential errors

bullet

Potential outliers (values that are abnormally high or low compared to average)

bullet

Biases in the data (numbers skewed to a value or range of values)

bullet

Possible processing inefficiencies (excessive repeats of low-value transactions)

bullet

A change in the composition of the numbers between time periods (Continuous Monitoring)

                The data analysis tests use mainly exploratory methods that look for anomalous patterns, differences, matches, and anomalies in data.  Other types of data analysis tests are data mining or rule discovery.  Data mining generally looks for patterns and relationships among data items using correlation tests.  Correlation tests are not described in this book but they should be quite easy to do after working through the material in the book.  Data analysis is also called statistics.  Statistics deals with the collection, analysis, interpretation, and presentation of large volumes of data.  Very few of the tests in this book can be described as statistics.

Chapter 2 discusses Benford’s Law which gives the expected frequencies of the digits in data and is named after Frank Benford, the physicist who published the seminal paper in 1938.  Contrary to intuition, the digits are not all equally likely and show a biased skewness in favor of the lower digits.  Benford noted that the first few pages of a log table book show more wear than the last pages.  He hypothesized that this was because most of the “used” numbers had a low first digit.  The common feature of the first few pages of the tables is that they show the logs of numbers with low first digits (1 and 2).  He then analyzed the first digits of 20 lists of widely different types of  data sets with a total of 20,229 numbers.  Benford’s results showed that 31 percent of the numbers had a first digit 1, 18 percent had a first digit 2, and only 5 percent had a first digit 9.  He saw a pattern to his results and, using calculus, he then derived the expected frequencies of the digits for tabulated “natural” data.  The expected frequencies are shown in Table 2.1 and the formulas are also shown in Chapter 2.  The book gives guidance in testing data sets for conformity to Benford’s Law and shows how the tests can be done in Access in Chapters 5 and 7.

                With regard to data analysis using Access, the main features supporting data analysis through queries are:

bullet

The ability to create a query using the wizards, design view, or SQL view

bullet

The ability to query tables, prior queries, or both tables and queries

bullet

The ability to use SQL view to modify a query created in Design view

bullet

Performance Analyzer (Tools, Analyze, Performance) to assist in making queries more efficient

bullet

The ability to format the output of the query

bullet

The ability to sort and resort query results without creating multiple dynasets along the way

bullet

Built in functions for calculated fields including Iif (Immediate If) function and full set of operators such as less than or greater than and also And, Or, and Not

bullet

Tolerance of empty (null) fields

bullet

The ability to read all the main file formats, with its ODBC capabilities well suited to an audit environment

bullet

The ability to easily join Tables allowing queries that are based on conditions in both or either tables.

bullet

The ability to seamlessly export results to Excel for further analysis or neat presentation

bullet

Excellent support available on the Microsoft website

In the book the user is shown the detailed steps needed (with screen shots) to run the tests.  The chapters discuss one or more tests, give notes and hints on using the tests, include a challenge to power users, and show the detailed steps in Access to run the tests.  The introductory chapters set the stage for using Access in data analysis and discuss the logic and architecture of Access.  The book includes two test data sets that will allow the user to duplicate the screen shots and Excel templates that produce graphs that match the format of the graphs in the book.

Ordering Details

Price is $70 for copies mailed to USA and Canada.

Send e-mail to author at mark_nigrini@msn.com

Payment in advance can be made by credit card by logging on to www.paypal.com and registering and then sending me $70 to my e-mail address.  Alternatively a check can be mailed to the address on my home page.

TOP

Home

Preface

PREFACE AND ACKNOWLEDGEMENTS

            We live in exciting times.  Words such as data warehouse, data mining, ODBC, gigabytes, and servers get used widely at conferences.  Desktop and notebook computers have progressed to processing speeds above 2 gigahertz and 1 gigahertz respectively.  Never before has data been so accessible with such ease to data analysts and never before has such computing power been available so cost effectively.  Add the missing ingredient – data analysis software in the form of Microsoft Access – and we have the recipe to accomplish great things.

In professional circles the Institute of Internal Auditors has redefined internal auditing to include more of a consulting role than was previously the case.  For external auditors the Panel on Audit Effectiveness suggested that auditors develop new approaches to auditing, including some form of continuous auditing, with greater emphasis on the use of technology-driven analytical and diagnostic procedures.  This and the high volume of questions posted on Microsoft’s newsgroup sites, the breadth of the questions, and the quality of the answers all point to the fact that many people are currently involved with data collection and data management and that still more will be involved in data analysis in the years to come.

            My foray into the world of data analysis started in 1989 with a half page discussion of Benford’s Law in a Statistical Decision Theory book.  Fortunately the author of the book, James Berger, gave the full reference to Benford’s paper.  I found the idea that there were predictable patterns to the digits in tabulated data to be very intriguing.  As a Ph.D. student at the University of Cincinnati my thoughts were that anything intriguing had the potential to be a dissertation topic and/or a published paper.  Soon after reading about Benford’s Law I went to the library and eagerly copied the 1938 paper written by Frank Benford.  I read the article over and over again in a mild disbelief.  Benford set out in precise terms the expected single digit and digit combination frequencies for lists of data.  These expected frequencies are now known as Benford’s Law.

I recorded my thoughts in a paper titled The Frequency Distribution of Digits and its Application to Accounting Research dated July, 1989.  With hindsight the fact that I saw many varied applications was good, the fact that I was advocating only one test was not good.  After much work I completed my dissertation in 1992 titled The Detection of Income Tax Evasion through an Analysis of Digital Distributions.  The taxpayer tax return data set that I used had 100,000 records and the thought of analyzing such a large data set using anything other than a mainframe computer, SAS, and JCL never even occurred to me.  One thing that is obvious with hindsight, is that in the course of writing my dissertation I developed some additional data analysis routines to Benford’s Law.

            In August, 1993 I accepted a position at Saint Mary’s University in Halifax, Nova Scotia.  I was convinced that Benford’s Law on its own was a powerful fraud detection technique.  My breakthrough came a few years later when on July 10, 1995 The Wall Street Journal published a story titled He’s got their number: Scholar uses math to foil financial fraud.  The He was me.

Since that time I’ve published papers on Digital Analysis in academic journals such as The Journal of the American Taxation Association, Auditing: A Journal of Practice and Theory, and The Journal of Accounting Education, and in practitioner journals such as Internal Auditor, Journal of Accountancy, IT Audit Forum, The White Paper, and IEEE Potentials.  My first book Digital Analysis Using Benford’s Law: Tests and Statistics for Auditors is published by Global Audit Publications.  Besides The Wall Street Journal my Benford’s Law work has been discussed in Canadian Business, Technology in Government, Contingencies, The Globe and Mail, CA Magazine, CFO Magazine, The Financial Times, The New York Times, Der Spiegel, Business Week, Dallas Morning News, Philadelphia Inquirer, The Journal of Accountancy, and USA Today.  The papers and the interviews all have two threads in common, (1) they focus on Benford’s Law, and (2) they require any user to write their own data analysis routines.

Over the course of the past two years I’ve developed this book which combines (1) data analysis tests including Benford’s Law, and (2) Microsoft Access.  What we have now is a book that describes sixteen data analysis tests, and how to run these tests in Microsoft Access.  In general, books on data analysis are books on statistics.  The basic building blocks of statistics are probability distributions, hypothesis testing, correlation, regression, analysis of variance, and time series analysis.  The tests described in this book are not statistics, but instead question the data, looking for oddities and anomalies.  These tests aim for high level overviews and then to run the records through the tests looking for things such as signs of procedural inefficiencies, red flags for fraud, and red flags for errors.  Most of the tests are tests derived from the tenet of Benford’s Law which is to identify abnormal duplications in data.  The data analysis tests in Chapter 13 (Histogram, Duplicates, Gaps, and Aging data) were included because they have stood the test of time in auditing circles.

This book should be of use to data analysts whether your job title or function includes the words analyst or auditor.  I have been surprised by the lack of coverage of data analysis in the off-the-shelf books on Access.  This is probably because of the volume of material that there is to cover in a database manual covering, inter alia, data entry, forms, reports, macros, modules, tables and importing/exporting of data.  This book covers data analysis tests and queries in depth.  In general, the data analysis routines require the use of one or more of the following:

bullet

Calculating sums and counts using the Where criteria in Access

bullet

Creating calculated fields to create fields that are calculated using data in other fields

bullet

Using built-in functions for complex calculations

bullet

Grouping records and then getting the Count, or Sum, or Maximum value for the groups

bullet

Identifying duplicate records usually with the Find Duplicates Query Wizard and non-matches with the Find Unmatched Query Wizard.

bullet

Using a Join to query two or more tables for matching records or non-matching records.

This book uses most of Access’ capabilities including a Make-Table query, creating a running sum (cumulative total), creating a table from scratch, identifying items in a sequence that are missing from a sequence.  The conventional use of tables makes finding what is missing particularly challenging.  We import data using File Import, we export output to Excel, we show how to create a UNION query, we go into SQL view and tweak it to meet our needs.  We even turn a Join operation inside out by using Is Null.

            There are three main parts to this book.  First, we use three chapters to review the theory and practical issues associated with data analysis and the architecture of Access.  Chapter 4 is an overview of some basic data analysis routines using Access.  Second, chapters 5 through 13 discuss the tests in those chapters with notes and tips for users.  The notes and tips cover my experiences and feedback from other users.  Third, chapters 5 through 13 show how to do these tests in Access.  The book includes a test data set and the Access screen shots show step-by-step how to do the tests and what your output should look like if you follow the steps using the test data set.

            The chapters describing data analysis tests have a Challenge to Power Users.  Here I briefly describe the logic used to get Access to generate the desired results.  Power users can read about the tests and can then simply follow the logic on their data sets without wading through the remaining pages.  Other users can use these pages as a summary of the Access logic which should make following the steps a little easier.  Some of these sections include a question or challenge for the power users.  Feedback is welcomed.  I’ve also included by way of appendices some resources on the web, including the web address for this book (extra notes, updates etc.) and a glossary of selected Access terms related to data analysis.  Thanks to Microsoft for permission to use this material from their website.

            This book took two years to complete.  Thanks to my Computer Modeling students at SMU for working diligently on their Access projects.  Some of the innovative ideas of Eduardo Carlo, Karen Perry, and Mark Wood are included in this book.  The first year was basically spent confirming that Access could in fact do all the tests known as Digital Analysis and could also do the Chapter 12 (audit sampling) and Chapter 13 (other miscellaneous) tests for both Access 97 and Access 2000.  The second year was spent working on the introductory chapters, writing the summary of the tests, tips and notes for users, and all the ancillary material.

            My thanks to Mark Irving  of Allete in Duluth, MN for working through the drafts of some of the chapters.  Thanks too to Will Yancy for his review of the audit sampling chapter.  I’d also like to thank Duane Hookom, and A. Nijborg for their tips that I got from the Microsoft Access Newsgroups site.  A large measure of thanks also goes to the management and auditors at American Airlines for integrating my data analysis tests into a comprehensive Data Analysis/Access course presented at AMR headquarters.  Steve Proesel has been instrumental over many years in advocating the use of data analysis and up-to-date-software for all aspects of data interrogation at American.  Steven Bemus spent time trying to figure out easier solutions for some of the Access routines and allowed me to “borrow” some of his thoughts for the introductory material.  A special thanks goes to Susan Oldfield for her ideas and for telling me about Snag-It which was used for the screen shots.

            I’d be interested in your feedback for future editions of this book and also for updates for the website www.nigrini.com.  Good luck with your venture into the world of data analysis and data interrogation.

Mark J. Nigrini Ph.D., Dallas, Texas.

January 11, 2002.

TOP

Table of Contents

USING MICROSOFT ACCESS FOR DATA ANALYSIS AND INTERROGATION

 

 

 

 

      1

INTRODUCTION  ………………….………………………………………

 

1

 

     History of data analysis

 

2

 

     Data analysis in auditing

 

3

 

     The layout of this book

 

4

 

 

 

 

      2

DATA ANALYSIS THEORY AND PRACTICAL ISSUES  …………….

 

5

 

     Benford’s Law

 

6

 

     Relationship to logs of numbers

 

11

 

     Applications of Benford’s Law

 

14

 

     Which data sets should conform to Benford’s Law?

 

15

 

     The effect of data set size

 

17

 

     Practical considerations for data analysis

 

18

 

     Obtaining data in a corporate environment

 

19

 

     Summary

 

20

 

 

 

 

      3

THE BASICS OF ACCESS  ……………….…………………………..…..

 

21

 

     The architecture of Access

 

22

 

     Notes on Access tables

 

23

 

     Required properties of database tables

 

23

 

     General tips on reading data into Access

 

25

 

     Queries

 

30

 

     Miscellaneous Access notes

 

33

 

 

 

 

      4

AN OVERVIEW OF DATA IMPORT, QUERIES AND CALCULATED FIELDS 

 

35

 

     Loading a data set into Access

 

36

 

     Basic summary statistics using built-in query wizard

 

48

 

     Selecting the high value records

 

54

 

     Creating a calculated field

 

61

 

     Grouping and counting records

 

64

 

 

 

 

      5

DATA PROFILE AND BASIC DIGIT TESTS  ………………………..…

 

69

 

     Data profile

 

70

 

     Data profile: Tips and notes

 

71

 

     Basic digit tests

 

72

 

     Basic digit tests: Notes and tips

 

73

 

     Challenge to power users

 

75

 

     Importing and reading the accounts payable data set

 

76

 

     Creating the data profile

 

83

 

     Exporting the Access results to Excel

 

87

 

     Performing the basic digit tests

 

91

 

     Graphing the basic digit tests

 

94

 

 

 

 

      6

NUMBER DUPLICATION  ……………………………………………….

 

99

 

     Notes on the number duplication test

 

100

 

     Challenge to power users

 

101

 

     Running the number duplication test

 

102

 

 

 

 

      7

LAST-TWO DIGITS AND ROUND NUMBERS  ………………….……

 

105

 

     Notes on the last-two digits and round numbers tests

 

106

 

     Challenge to power users

 

107

 

     Running the last-two digits test

 

108

 

     Running the round numbers test

 

115

 

 

 

 

      8

LARGEST SUBSETS AND ALL NUMBERS EQUAL  ……………..….

 

119

 

     Notes on the largest subsets test

 

120

 

     Notes on the all numbers equal test

 

122

 

     Challenge to power users

 

123

 

     Running the largest subsets test

 

124

 

     Running the all numbers equal test

 

126

 

 

 

 

      9

NUMBERS OCCUR TWICE AND ROUND NUMBERS  …….………..

 

135

 

     Notes on the numbers occur twice test

 

135

 

     Notes on the round number subsets test

 

136

 

     Challenge to power users

 

137

 

     Running the numbers occur twice test

 

138

 

     Running the round numbers subset test

 

140

 

 

 

 

    10

RELATIVE SIZE FACTOR  ………………………………………………

 

143

 

     Notes on the relative size factor test

 

144

 

     Challenge to power users

 

145

 

     Running the relative size factor test

 

146

 

 

 

 

    11

DUPLICATE RECORDS AND SAME-SAME-DIFFERENT TESTS 

 

157

 

     Notes on the duplicate records test

 

157

 

     Notes on the same-same-different test

 

158

 

     Challenge to power users

 

159

 

     Running the duplicate records test

 

160

 

     Running the same-same-different test

 

164

 

 

 

 

    12

AUDIT SAMPLING  ……………………………………………………….

 

173

 

     Difference estimation using hypothesis testing

 

174

 

     Monetary unit sampling

 

177

 

     Challenge to power users

 

178

 

     Extracting a random sample

 

179

 

     Extracting a MUS sample

 

182

 

     Special notes on the techniques used in this chapter

 

194

 

 

 

 

    13

HISTOGRAM, DUPLICATES, GAPS, AND AGING  ……….……..…..

 

195

 

     Creating a data set in Access

 

196

 

     Challenge to power users

 

187

 

     Preparing a histogram using a union query

 

198

 

     Preparing a histogram using linked queries

 

203

 

     Identifying duplicates in data

 

208

 

     Identifying gaps in data

 

211

 

     Aging data

 

223

 

 

 

 

    14

REVIEW AND CONCLUSIONS  …………………………………………

 

225

 

     Using Access for data analysis

 

226

 

     The format of data analysis queries in Access

 

226

 

     A summary of the data analysis tests

 

227

 

     Continuous monitoring

 

229

 

     Future data analysis tests

 

229

 

     Closing thoughts

 

230

 

 

 

 

 

APPENDIX A: Resources on the Internet

 

231

 

 

 

 

 

APPENDIX B: Bibliography

 

233

 

 

 

 

 

APPENDIX C: Glossary of Selected Access 2002 Terms

 

235

 

 

 

 

 

Index

 

243

 

TOP

Software and data file

Here are the two data files needed to duplicate the work in the book and three Excel files to be used to produce a neat data profile, basic digit tests, and last-two digits graph.

Please download the zip file here: software_and_data_files.zip.  

Author Notes

 

In Appendix A the site http://www.athree.com is shown as a forum for posting of questions. This site was closed in April, 2002.  If you go to this site, you will be redirected to the successor site (utteraccess.com).

 

Tenet Healthcare, Medicare Audit.

Tenet Healthcare is currently the focus of a Medicare audit.  The extract from the article below notes that the Redding Hospital had a heart surgery rate that was the highest in California, and twice the national average.

Audit of suspected Medicare billing fraud

In the Data Analysis Using Access book (described above) the test shown in Chapter 8 (Largest Subsets) was, amongst others, designed to detect instances where a fraud occurs to such an extent that the subset (vendor, employee, hospital, location, frequent flyer account etc.) total was so high that a listing of the subset totals has them ranked near the top.  In many cases fraudsters "just don't know when to stop."

Recommended books on Microsoft Access

These are the books in my Access library that you might consider adding to your library too:

February 19, 2006 update:

I have just spent 30 minutes randomly leafing through my newest book on Microsoft Access and my first impression is that this book is excellent.  The book is Microsoft Access Data Analysis written by Michael Alexander and published by Wiley.  In the first 15 minutes alone I saw (1) a way to add the Rank to the output of a query (I could this before using a Make Table query and adding an ID field but this book's method is better), (2) getting a frequency distribution using one query (I could do this but my method using a linked query takes longer), and (3) parsing strings in a field separated by commas (this is exactly what the output looks like from a ERP system and I could do this using multiple Append queries, but again this book's method is better).  The book also actually talks about things like the Domain Aggregate Functions and the very valuable Switch function.  This book is a must have for anyone that spends time using Access queries.  The book could be called "Queries 'R Us."

My second choice for your library:

"Running Microsoft Access 2000" by John Viescas, published by Microsoft Press (about $45).  This is a good comprehensive A-Z reference on Access. See http://www.viescas.com/Info/books.htm

"Building Microsoft Access Applications" by John Viescas.

Optional extras:

"Access 2002 for Dummies" by John Kaufeld (about $22).  This is a nicely written book that would be good for someone about to take a seminar or about to start using at work - a good first book.  Book can be found in most bookstores and on amazon.

"Access 2002 MOUS Expert Level" by Floyd Winters and Julie Manchester, published by Prentice Hall (about $50).  This book covers some advanced material and does so very well.  You will need Access 2002 to be able to use the files on the CD that comes with the book.  See, http://vig.prenhall.com/catalog/academic/product/0,4096,0130497851,00.html

"Access 2000 Essentials Intermediate" by Robert Ferrett, Sally Preston, and John Preston (about $20).  This book talks about when and why you would want to use some Access objects.  Good sections on forms, reports, and macros.  See, http://vig.prenhall.com/catalog/academic/product/0,4096,1580763014,00.html

"Access 2000 Essentials Advanced" see above.  Book is well suited to someone that has a project up and running and now wants some fine tuning.  See, http://vig.prenhall.com/catalog/academic/product/0,4096,1580763022,00.html

"Database design for mere mortals" by Michael Hernandez (about $50).  This book covers all the design issues in enough detail for an auditor that needs to audit a database application of for someone embarking on a first project of low complexity. See  http://www.formeremortals.com/

 

Recommended websites for users of Microsoft Access

http://www.rogersaccesslibrary.com  (site includes a compendium of over 100 Access routines that show a solution to an Access problem that a user might be faced with.  Site also includes links to other Access libraries, link to Access webring, and articles (called papers) on miscellaneous Access topics.  Well worth a visit.  Roger Carlson answers e-mails sent to him)

Microsoft Access web-based seminars

Institute of Internal Auditors Audit Learning: This web-based training course will review Benford's Law and some other data analysis tests and will show how the tests can be done in Microsoft Excel and Microsoft Access.  CPE hours will be one, two, or three hours, and the web-based Benford's Law seminar should be available in early January, 2003.  The IIA's high-quality web-based seminars are usually $40 per CPE hour.  The IIA is a NASBA approved sponsor for CPE credit.

Benford's Law, Statistics, and Microsoft Access consulting in Germany, see site of Dr. Marcus Herold.
My forthcoming Microsoft Access book's (updated 28 March, 2007):

Using Microsoft Access 12 for Data Analysis and Interrogation.  This will be an update to my current book (shown above) and will be updated as follows: (1) screen shots and methods (such as data import) will be based on  Access 12 (due for release in late 2006), (2) book will include exercises and cases so that it can be used as a college textbook, (3) book will include Excel macro's and Excel procedures so that the tests could be run in Excel 12 with it's forthcoming 1,048,576 rows, and (d) book will include additional tests along the lines of my recent journal publication titled "Monitoring Techniques Available to the Forensic Accountant" published in the Journal of Forensic Accounting (December, 2007).

Available October, 2007.

Continuous Monitoring.  This book will show how to build selected continuous monitoring applications using both Excel 12 and Access 12 that would be of value to Internal Audit and Forensic Accounting departments.  Examples would include a system to place audit units units on a watch-list after being alerted to possible irregularities as a result of Continuous Monitoring activities.  Other examples will be drawn from my Continuous Monitoring seminar and my academic and professional work.

Available middle of 2008.

----------------

TOP

Home
 

Mark J. Nigrini Ph.D.

55 Heath Court, Pennington, New Jersey, 08534

Tel: (609) 303-0533  E-mail: mark_nigrini at msn dot com