I’m currently working on the subject of plant enhancers. The species I want to use are the enhancers of Arabidopsis and maize. I have found the enhancer data that can be determined in related articles, but the data in the input model also needs non -enhancer. My idea is to randomly intercept fragments in non-enhancer regions outside the known enhancer regions as non-enhancers, and then use CD-HIT to control redundancy. I would like to ask whether this method is reasonable?

If you have any better solution, please tell me~


