摘要

The increasing richness in the volume and types of data in the financial domain provides unprecedented opportunities for understanding the stock market more comprehensively and makes price predictions more accurate than before. However, this situation also brings challenges to classic statistical approaches since these models might be constrained to a certain type of data. Aiming to aggregate information from different sources and to offer type-free capability to existing models, a framework for predicting the stock market in scenarios with mixed data, including scalar data, compositional data (pie-like) and functional data (curve-like), is established. The presented framework is model-independent because it serves as an interface to multiple types of data and can be combined with various prediction models. Moreover, the framework is proven to be effective through numerical simulations. For price prediction, we incorporate the trading volume (scalar data), intraday return series (functional data), and investors' emotions from social media (compositional data) through the framework to competently forecast the market trend at opening on the next day. The strong explanatory power of the framework is further demonstrated. Specifically, the intraday returns are found to impact the following opening prices differently between a bearish market and a bullish market. Additionally, it is not at the beginning of the bearish market but rather the subsequent period in which the investors' "fear" becomes indicative. This framework would help to easily extend existing prediction models to scenarios with multiple types of data and to provide a more systemic understanding of the stock market.