I have an issue extracting ensembl gene ids from a messy data frame.
First, I loaded the csv file in R (file that was not separated by commas) and looks like:
> my_csv_file ensembl_gene_id.entrezgene_id.hgnc_symbol.gene_biotype 1 1 ENSG00000174365 128439 SNHG11 lncRNA 2 2 ENSG00000180385 NA EMC3-AS1 transcribed_unprocessed_pseudogene 3 3 ENSG00000183562 NA lncRNA 4 4 ENSG00000205266 NA KRT17P5 transcribed_unprocessed_pseudogene 5 5 ENSG00000206585 26864 RNVU1-7 snRNA 6 6 ENSG00000206588 NA RNU1-28P snRNA
Then, I tried to extract the ensembl gene id from each row using sub function.
For example, for row number 1:
> sub("^\d", "", my_csv_file[1, ]  " ENSG00000174365 128439 SNHG11 lncRNA"
However, I'm stuck because I don´t know how to remove the alphanumeric characters after the ensembl id by using regular expressions and then put it inside a for loop.
I appreciate your help.