Posts by Collection

research

Financial Statement Fraud Detection Using Large Language Models

Last Update: February 17, 2024, Ceteris Paribus – The Undergraduate Journal of Economics at UNC-Chapel Hill

We propose a novel financial statement fraud detection model leveraging recent advances in Large Language Models (LLMs), specifically LLaMA2. By integrating deep learning techniques with domain-specific financial expertise, we demonstrate significant enhancements in the detection of fraudulent activities. Unlike prior studies, our approach utilizes LLMs to systematically analyze a comprehensive combination of numerical and textual data extracted from annual 10-K financial statements. Our findings highlight the potential of LLMs to substantially improve predictive accuracy, offering practical value to auditors, regulators, and investors.

Download Paper

A Machine Learning Approach to Understanding Short Interest and Stock Returns

Last Update: May 06, 2025, Work In Progress

We investigate how asset prices incorporate information through the lens of shorting activity, focusing on both rational explanations and potential behavioral biases among different types of market participants. We ask whether observed shorting patterns reflect rational updates to new information or instead reveal biases such as extrapolation, overconfidence, or prospect theory–type behavior. We adapt recent machine-learningbased approaches from empirical asset pricing by replacing future returns as the dependent variable with shorting volume. By leveraging a large cross-section of firm and macro-level predictors—encompassing both standard risk factors and variables linked to behavioral tendencies—we aim to uncover which signals dominate short-seller behavior and how strongly those signals propagate into market prices.

Sustainable Asset Pricing with Heterogeneous Agents: A Computational Approach

Last Update: November 15, 2025, Senior Honors Thesis

We introduce investors with heterogeneous environment, society, and governance (ESG) asset preferences—who derive non-pecuniary utility from green asset portfolio weights—into a general equilibrium asset pricing model with incomplete markets. Their preference for green assets induces a ‘green premium,’ and its magnitude and persistence are determined by the interaction between portfolio constraints and wealth dynamics. We show that binding portfolio constraints are essential, as they prevent traditional, profit-maximizing investors from arbitraging the premium away. This market incompleteness allows the green premium to be sustained. Calibrated to German twin bond data, our solution will reveal the green premium is not static but fluctuates endogenously with the dynamic wealth distribution between ESG-conscious and traditional investors. These constraints are essential for sustaining the premium and amplify preference transmission during market stress. We successfully solves the model’s high-dimensional state space under both constant relative risk aversion (CRRA) and Epstein-Zin recursive preferences using deep equilibrium networks and interpolation methods, enabling a precise analysis of the wealth-reallocation mechanisms—dynamics previously invisible to representative agent frameworks.

Daily Market Return Prediction with Transformer

Last Update: December 25, 2025, Working Paper

We apply a Transformer encoder to forecast daily market returns using lagged market returns over horizons of 5, 20, and 60 days. Both the direct model forecasts and post-machine learning forecasts exhibit significant predictive power for next-day returns, while simple averages of past returns do not. Relative to linear predictive regressions, the machine learning forecasts deliver sizable improvements in out-of-sample R-squared. A mean-variance analysis with a risk-aversion coefficient of two shows that the Transformer prediction generates an average return of 30% per annum with a Sharpe ratio of 1.3. The predictability is more pronounced in recessions and periods of elevated investor sentiment. Random Forests and feed-forward Neural Networks also yield economically meaningful, though somewhat weaker, results.

Co-authors: Yufeng Han (UNC Charlotte), Guofu Zhou (Washington University in St. Louis)

Download Paper | Download Sample Code