Search results for: 'Superpixel semantics representation and pre-training for vision-language tasks'