Hello, I am trying to convert a bed file into a vcf by using python. I started off by parsing the file. Then created another file to open the reference genome to compare it to the bed file. The next step that I am trying to do is to write was is in rows to a file called P_1_BZ.vcf. However, I am getting an error that rows needs to be a string to write rows to the vcf file. I tried using join() to convert it to strings, but the format changes and does not write the appropriate information to the new file. Can someone help me understand how to appropriately write what rows contains to the vcf file?

#! usr/bin/env python

import Coronavirus

#Deletions that occurs  in B_1_351_SA
deletion1=Coronavirus.referenceGenome(11287,11296)
deletion2=Coronavirus.referenceGenome(22280,22289)

with open("B_1_351_SA.bed") as fo:
    lines = fo.readlines()

    #this is the header for what the vcf file needs
    header = ["CHROM        ","POS","   ID","   REF","  ALT"]
    print(header)
    rows = []
    # print(header)
    for i in range(0,len(lines)):
        # print(lines[i])

        # for the first two rows since the start and end SNP  contain 3 digits
        # focus on the first 2 rows
        l = [0,1]
        if(i in l):
            # print(lines[i])
            rows= [
                lines[i][0:11].split(),  #CHROM
                lines[i][11:15].split(), #POS : the position that it started
                lines[i][19:25].split(), #ID : position that shows where was the alteration
                lines[i][20:21].split(), #REF : reference nucleotide
                lines[i][24:25].split()  #ALT : nucleotide it changed to
                ]
        l = [2,3,4]
        if(i in l):
            rows = [
                lines[i][0:11].split(),  #CHROM
                lines[i][11:16].split(), #POS : the position that it started
                lines[i][21:28].split(), #ID : position that shows where was the alteration
                lines[i][22:23].split (), #REF : reference nucleotide
                lines[i][27:28].split()   #ALT : nucleotide it changed to
            ]
        l = [5,7,8,10,11,12,13,14,15,16,17,18]
        if(i in l):
            rows = [
                lines[i][0:11].split(),  #CHROM
                lines[i][11:18].split(), #POS : the position that it started
                lines[i][24:32].split(), #ID : position that shows where was the alteration
                lines[i][24:25].split(), #REF : reference nucleotide
                lines[i][30:31].split()  #ALT : aleration nucleotide
            ]

        #row 6 and 9 have deletions and needed to properly print it out
        l = [6]
        if i in l:
            rows =[
            lines[i][0:11].split(),  #CHROM
            lines[i][11:18].split(), #POS : the position that it started
            lines[i][24:34].split(), deletion1
            ]

        #row 6 and 9 have deletions and needed to properly print it out
        l = [9]
        if i in l:
            rows =[
            lines[i][0:11].split(),  #CHROM
            lines[i][11:18].split(), #POS : the position that it started
            lines[i][24:34].split(), deletion2
            ]    
with open("P_1_BZ.vcf","w+") as vcf:
    vcf.write(rows)

This is a glimpse of what rows contains and what I am trying to write to P_1_BZ.vcf

['NC_045512v2'], ['11287'], ['del_11288'], ['TCTGGTTTT']]
[['NC_045512v2'], ['12777'], ['C12778T'], ['C'], ['T']]
[['NC_045512v2'], ['13859'], ['C13860T'], ['C'], ['T']]
[['NC_045512v2'], ['14407'], ['C14408T'], ['C'], ['T']]
[['NC_045512v2'], ['17258'], ['G17259T'], ['G'], ['T']]
[['NC_045512v2'], ['21613'], ['C21614T'], ['C'], ['T']]
[['NC_045512v2'], ['21620'], ['C21621A'], ['C'], ['A']]
[['NC_045512v2'], ['21637'], ['C21638T'], ['C'], ['T']]
[['NC_045512v2'], ['21973'], ['G21974T'], ['G'], ['T']]
[['NC_045512v2'], ['22131'], ['G22132T'], ['G'], ['T']]



Source link