Abstract:
Using the universe of Armenian business tax payers operating under a standard tax regime, we develop a fraud prediction model based on machine learning tools, with gradient boosting as the primary choice. Having to deal with broadly defined fraud and heterogeneous taxpayers, as well as a relatively small sample, we successfully derive important features from tax returns with a minimum of additional information. Among the important fraud predictors, we obtain historical fraud and audit, share of administrative costs, and external economic activity. We see two main contributions with generalizable practical implications for auditing authorities. First, by focusing on the lift score of the top decile, we demonstrate that even moderately accurate models can improve upon existing accuracy of rule-based approaches. Second, and more importantly, we demonstrate that the information contained in the supplier and buyer network of the taxpayer can be used whenever important predictors of fraud such as historical audits and fraud are not available. This is particularly important for situations with newly established companies, who would otherwise be under-rated in terms of fraud probability.