Abstract: Open-vocabulary semantic segmentation attempts to classify and outline objects in an image using arbitrary text labels, including those unseen during training. Self-supervised learning ...