I want to directly obtain the features of a single region and then perform the similarity matching between the region feature and the text feature by myself. How should I proceed? I don’t want to input the whole image and predict the bounding boxes first, as described in the tutorials for extracting region features.
I want to directly obtain the features of a single region and then perform the similarity matching between the region feature and the text feature by myself. How should I proceed? I don’t want to input the whole image and predict the bounding boxes first, as described in the tutorials for extracting region features.