Research

Market Microstructure

We develop two new methods for matching trades and bid-ask quotes that account for information latency in the era of fast trading. The first method adjusts for exchange-to-SIP latency. The second method constructs exchanges' Relative Best Bid and Offer (RBBO) based on exchange-to-exchange latency and data center co-locations. We test these trade classification methods using over 650 million TAQ trades matched with order executions in TAQ Integrated Feed. We find that the first method improves the Lee Ready (1991) trade classification accuracy from 86% to 92%. Our preferred method, the RBBO method, further improves accuracy by another 58 bp but is computationally expensive. Using an exogenous technological shock, we find that adjusting for latency could alter research inferences when measuring liquidity. 

[SAS code]

Key message in one chart: 

As exchange routing activity rises (green), existing trade classification methods (SIP time in orange, Direct NBBO time in purple) that assume zero exchange-to-exchange latency deliver lower accuracy. The latency adjusted methods proposed in our paper (grey and navy) persistently output superior buy/sell trade classification accuracy results and outperform existing methods every day.

Boehmer et al. (2021) propose a methodology to infer retail trades from publicly available NYSE Trade and Quote (TAQ) data. Their methodology relies on assumptions about what types of orders do and do not trade on non-quote-midpoint sub-penny increments via the Trade Reporting Facility (TRF). We obtain proprietary data from one or more wholesalers known to receive marketable orders from retail brokers. We use these data to demonstrate that the Boehmer et al. (2021) methodology identifies less than one-third of trades generally assumed to be from retail investors and analyzes cross-sectional determinants of the technique’s identification rate. In addition, we obtain proprietary data on institutional trades from multiple sources and demonstrate that a large number of such trades print on the TRF at non-quote-midpoint sub-penny prices in violation of the assumption that institutional orders trade only on penny or half-penny increments. Thus, there are both Type I and Type II errors that affect the ability to identify retail and only retail trades from TAQ using the Boehmer et al. (2021) methodology. Finally, we demonstrate that these errors can produce different inferences regarding the association between lagged retail order imbalance measures and stock returns.

Why should we care?

The BJZZ algorithm identifies about 7% of total trades in today's market are retail, which is relatively much smaller than industry reports on OTC volume at TRF (for example, 31% of total traded share volume on OTC is considered as "mostly retail" by Nasdaq, on the right chart). Our paper uses retail wholesaler trading data, finding that BJZZ captures about 1/3 of total retail activity. Our finding reconciles the gap between BJZZ and industry reports on retail activity. Researchers using the BJZZ method to quantify retail activity should know this limitation.

Reference: Nasdaq Economic Research by Phil Mackintosh.

Investments