Good day! First, I'd like to say great work on this!
As I was trying to reproduce the results found here, I'd like to focus on COCO (Novel, 31.4) and LVIS (Novel, 22.0).
Show below is the bash script I'm using to test your fine-tuned open-vocabulary detector on COCO.
python3 ./tools/train_net.py \ --eval-only \ --num-gpus 1 \ --config-file ./configs/COCO-InstanceSegmentation/CLIP_fast_rcnn_R_50_C4_ovd_testt.yaml \ MODEL.WEIGHTS ./pretrained_ckpt/regionclip/regionclip_finetuned-coco_rn50.pth \ MODEL.CLIP.OFFLINE_RPN_CONFIG ./configs/COCO-InstanceSegmentation/mask_rcnn_R_50_C4_1x_ovd_FSD.yaml \ MODEL.CLIP.BB_RPN_WEIGHTS ./pretrained_ckpt/rpn/rpn_coco_48.pth \ MODEL.CLIP.TEXT_EMB_PATH ./pretrained_ckpt/concept_emb/coco_48_base_cls_emb.pth \ MODEL.CLIP.OPENSET_TEST_TEXT_EMB_PATH ./pretrained_ckpt/concept_emb/coco_17_target_cls_emb.pth
After doing the inference, I got this
WARNING [10/30 14:32:12 fvcore.common.checkpoint]: Some model parameters or buffers are not found in the checkpoint:
offline_backbone.bottom_up.res2.0.conv1.norm.{bias, weight}
offline_backbone.bottom_up.res2.0.conv1.weight
offline_backbone.bottom_up.res2.0.conv2.norm.{bias, weight}
offline_backbone.bottom_up.res2.0.conv2.weight
offline_backbone.bottom_up.res2.0.conv3.norm.{bias, weight}
offline_backbone.bottom_up.res2.0.conv3.weight
offline_backbone.bottom_up.res2.0.shortcut.norm.{bias, weight}
offline_backbone.bottom_up.res2.0.shortcut.weight
offline_backbone.bottom_up.res2.1.conv1.norm.{bias, weight}
offline_backbone.bottom_up.res2.1.conv1.weight
offline_backbone.bottom_up.res2.1.conv2.norm.{bias, weight}
offline_backbone.bottom_up.res2.1.conv2.weight
offline_backbone.bottom_up.res2.1.conv3.norm.{bias, weight}
offline_backbone.bottom_up.res2.1.conv3.weight
offline_backbone.bottom_up.res2.2.conv1.norm.{bias, weight}
offline_backbone.bottom_up.res2.2.conv1.weight
offline_backbone.bottom_up.res2.2.conv2.norm.{bias, weight}
offline_backbone.bottom_up.res2.2.conv2.weight
offline_backbone.bottom_up.res2.2.conv3.norm.{bias, weight}
offline_backbone.bottom_up.res2.2.conv3.weight
offline_backbone.bottom_up.res3.0.conv1.norm.{bias, weight}
offline_backbone.bottom_up.res3.0.conv1.weight
offline_backbone.bottom_up.res3.0.conv2.norm.{bias, weight}
offline_backbone.bottom_up.res3.0.conv2.weight
offline_backbone.bottom_up.res3.0.conv3.norm.{bias, weight}
offline_backbone.bottom_up.res3.0.conv3.weight
offline_backbone.bottom_up.res3.0.shortcut.norm.{bias, weight}
offline_backbone.bottom_up.res3.0.shortcut.weight
offline_backbone.bottom_up.res3.1.conv1.norm.{bias, weight}
offline_backbone.bottom_up.res3.1.conv1.weight
offline_backbone.bottom_up.res3.1.conv2.norm.{bias, weight}
offline_backbone.bottom_up.res3.1.conv2.weight
offline_backbone.bottom_up.res3.1.conv3.norm.{bias, weight}
offline_backbone.bottom_up.res3.1.conv3.weight
offline_backbone.bottom_up.res3.2.conv1.norm.{bias, weight}
offline_backbone.bottom_up.res3.2.conv1.weight
offline_backbone.bottom_up.res3.2.conv2.norm.{bias, weight}
offline_backbone.bottom_up.res3.2.conv2.weight
offline_backbone.bottom_up.res3.2.conv3.norm.{bias, weight}
offline_backbone.bottom_up.res3.2.conv3.weight
offline_backbone.bottom_up.res3.3.conv1.norm.{bias, weight}
offline_backbone.bottom_up.res3.3.conv1.weight
offline_backbone.bottom_up.res3.3.conv2.norm.{bias, weight}
offline_backbone.bottom_up.res3.3.conv2.weight
....
with very low result

Could anyone give me some advice
!!! Note that:
I use python 3.9, torch1.9.1+cu111
Good day! First, I'd like to say great work on this!
As I was trying to reproduce the results found here, I'd like to focus on COCO (Novel, 31.4) and LVIS (Novel, 22.0).
Show below is the bash script I'm using to test your fine-tuned open-vocabulary detector on COCO.
python3 ./tools/train_net.py \ --eval-only \ --num-gpus 1 \ --config-file ./configs/COCO-InstanceSegmentation/CLIP_fast_rcnn_R_50_C4_ovd_testt.yaml \ MODEL.WEIGHTS ./pretrained_ckpt/regionclip/regionclip_finetuned-coco_rn50.pth \ MODEL.CLIP.OFFLINE_RPN_CONFIG ./configs/COCO-InstanceSegmentation/mask_rcnn_R_50_C4_1x_ovd_FSD.yaml \ MODEL.CLIP.BB_RPN_WEIGHTS ./pretrained_ckpt/rpn/rpn_coco_48.pth \ MODEL.CLIP.TEXT_EMB_PATH ./pretrained_ckpt/concept_emb/coco_48_base_cls_emb.pth \ MODEL.CLIP.OPENSET_TEST_TEXT_EMB_PATH ./pretrained_ckpt/concept_emb/coco_17_target_cls_emb.pthAfter doing the inference, I got this
with very low result

Could anyone give me some advice
!!! Note that:
I use python 3.9, torch1.9.1+cu111