Modern ERP systems can contain millions if not billions of transactions and balances. Auditing that volume of data efficiently and effectively using traditional sampling and tabular methods is difficult if not impossible. Moving beyond the traditional charts and graphs have long been a tool to allow auditors to illustrate audit findings and results, exploratory data visualization techniques can allow you to gain insights and identify audit issues that spreadsheets and sampling cannot. In this webinar, you will learn three exploratory data visualizations that you can immediately incorporate into your audits.
Learning Objectives: After completion of this program you will be able to:
- Understand which datasets are best audited using exploratory visualizations
- Learn how to create three exploratory data visualizations (linear regressions, variances, and pattern identifications) to immediately begin using in your audits
- Understand how to link data visualization indicators with actual transactional or balance related findings
- Learn how to make data visualizations an understandable part of your audit report
Q&A from attendee-submitted questions and comments:
That is correct. The validity of the data being pulled is essential. A valid method such as regression applied to incorrect, incomplete, and misinterpreted data will, of course, result in more risk, not less. As is the case in any audit testing, a pre-step is ensuring that the data pulled must be tied out for completeness- ideally to trial balance account balances or subdata (e.g. purchases hitting accounts payable, or in the case of the inventory purchases related to COGS, purchases broken down into the components of inventory and non-inventory purchase). Data retrieval is always one of the biggest challenges in audit, and even more so ensuring the data is clean and correct. As to ensuring correct interpretation of the data (e.g. the data relationship between the chosen x’s and y’s) this is a key part of the developing the hypothesis- just as in the scientific method, the hypothesis should be based on educated/informed conjectures as to what a relationship should be based on an understanding of the business. As in our example, if you are selling inventory, it will need to be replenished. As with any statistical tool, care should be exercised to not take the result as an audit finding, but instead as an indicator that a key business process relationship has changed from what was expected. From there, the auditor should ask questions.
Overfitting or underfitting will result in bad conclusions. This is why it is very important to not only generate the visual, but to also check the statistics generated by the regression tool. The first (see slide 23) is Multiple R. Multiple R, if it is close to a value of 1 (say a tolerance of no less than .85) indicates that there is a valid relationship between the X and Y variable- in our example that inventory purchases (Y) are a dependent relationship to costs of goods sold (COGS). This is based on the assumption that inventory purchases will ultimately be related to what an organization is selling. That said, the auditor can’t stop there. Just because you can force a fit, the data may not fit well. The measure of whether the data fits the relationship well is R Square. As in Multiple R, R Square should be near 1 (say a tolerance of no less than .85). If you have a Multiple R and an R Square close to one (within a reasonable tolerance), then the regression is a good fit without unreasonable over/underfitting.
Not that have been specifically developed for this presentation, but the same steps would apply.
1) Develop a hypothesis; Gather data for both prior and current periods; Test the hypothesis; Conclude on the hypothesis- is there actually a relationship between the values; Test the current period and analyze the results; Follow up on significant differences; and Present the results.
2) A good starting point might be identifying revenue sources (e.g. account fees might be a possible Y) and then making educated guesses as to what the driver(s) of those fees are (average account balances, overdraft, etc. might be possible Xs). Then gather the data and test the hypothesis(ses).
There are more complex software tools that can allow regression analysis on very large data sets (> 1 million rows), but there are a few workarounds for using large data within Excel. The best is Excel Power Query. Power Query is the ETL (extract, transform, and load) engine included with Excel for the purposes of allowing a user to work with an unlimited number of rows. A user can pull the large data set in Power Query and then use “grouping” to summarize the data into a manageable data set that can work within the Excel row limit. For example, if you are working with 10 million rows, you can group that data (transactions) into days, which will give you 365 rows. Just remember that if you group the data for your X values (Cost of Sales in our presentation example), that you also must group the Y values in the same manner (into days).
It is generally better to avoid using statistical analysis terms when following up with an auditee. Instead, ask the questions in business user friendly language, going from general to more specific. For example, if based on the linear regressions-based variance analysis, inventory purchases appear unusual for a month, quarter, etc., it is better to ask “When looking over activity in Q1, I noticed that inventory purchases were higher (or lower) than compared to prior period(s). Were there any significant changes in the business or supply chain that this is related to?” Then based on the response, ask more specific questions.