Text this: Speech Emotion Recognition Using Convolution Neural Networks and Multi-Head Convolutional Transformer