I am attempting to convert hg38 gtf file into a nested array so I can do binary search with it. I am trying to make the nested array based on position in which the first array is each chromosome sorted:
chromosomes =  for i in range(1, 23): chromosomes.append(i)
the second array would be strand (+, -)
strands = ['+', '-'] for i in range(0,len(chromosomes)): chromosomes[i] = strands
the third array would be start, end positions
and the final array would be a list of attributes such as transcript_id and gene_id.
I am not sure the best way to iterate through the gtf file that I loaded to append my current array of arrays. I have this so far, but I am not sure if it is working or just taking a long time for the size:
for i in range(0, len(chromosomes)): for index, row in sorted_df.iterrows(): if (str(i) == row['chr']) & (chromosomes[i] == row['strand']): positions =  positions.append(row['start'], row['end']) chromosomes[i] = positions positions.clear()
Is this the right way of thinking about this problem or is there a better way to approach it? Any help would be appreciated.