Fine tuning image selctions

Hi all, I am finetuning a OCR model for document analysis. so i have a bunch of images and i need to remove some of thema and only consider few of them using criteria’s like

-> Having tables
→ Having more than 60% text on page
→ can have any special images like signs, stamps..etc

So any suggestion how to do?
Thanks in advance!!!

curating the data set is a human job. you can pay some people to help you do this. there are websites where you can offer such tasks.