Text this: Infrared and Visible Image Fusion Based on Autoencoder Composed of CNN-Transformer