Abstract: This research presents a Transformer-based multi-modal architecture for predicting box office revenue by integrating diverse data sources: text, visuals, and numerical features. The proposed ...