July 2023
Traditional representations of firms use accounting and financial market data, but investors use richer information sets. Theoretically, portfolio holdings contain all relevant information for asset prices, recoverable under empirically realistic conditions. Building on recent advances in machine learning and artificial intelligence, we develop asset embeddings that leverage portfolio holdings to represent firms, similar to word embeddings leveraging document structure. We evaluate different methods of estimating asset embeddings on three new benchmarks. We also develop investor embeddings to represent investors and their strategies. We economically interpret asset (investor) embeddings by applying large language models to firm- (investor-)level text data.