Text this: Self-Supervised Hypergraph Learning for Enhanced Multimodal Representation