Fetch table from clinvar database according to a list of rsid
1
0
Entering edit mode
9 days ago
ashaneev07 ▴ 20

Hi

I have a list of rsid and i want to search against clinvar database and print the condition_germline column with respect to each rsid. Anyway, i have got a script.

use strict;
use warnings;
use LWP::Simple;
use HTML::TableExtract;

# Read list of rsids from file
my $rsids_file = 'rsids.txt';
open(my $fh, '<', $rsids_file) or die "Can't open $rsids_file: $!";
my @rsids = <$fh>;
close($fh);

# Loop through each rsid
foreach my $rsid (@rsids) {
    chomp($rsid);

    # Construct URL for ClinVar search
    my $url = "https://www.ncbi.nlm.nih.gov/clinvar/variation/$rsid/";

    # Fetch web content
    my $content = get($url);
    unless (defined $content) {
        warn "Couldn't get $url: ", $!;
        next;
    }

    # Extract table
    my $te = HTML::TableExtract->new(headers => ["Condition_Germline"]);
    $te->parse($content);

    # Print condition_germline column
    foreach my $ts ($te->tables) {
        foreach my $row ($ts->rows) {
            print join("\t", @$row), "\n";
        }
    }
}

But, when its runs getting the following error.

Couldn't get https://www.ncbi.nlm.nih.gov/clinvar/variation/rs11203366/:  at .\fetch_condition.pl line 22.
Couldn't get https://www.ncbi.nlm.nih.gov/clinvar/variation/rs11203367/:  at .\fetch_condition.pl line 22.
Couldn't get https://www.ncbi.nlm.nih.gov/clinvar/variation/rs874881/:  at .\fetch_condition.pl line 22.
Couldn't get https://www.ncbi.nlm.nih.gov/clinvar/variation/rs776453694/:  at .\fetch_condition.pl line 22.
Couldn't get https://www.ncbi.nlm.nih.gov/clinvar/variation/rs80324279/:  at .\fetch_condition.pl line 22.
Couldn't get https://www.ncbi.nlm.nih.gov/clinvar/variation/rs324420/:  at .\fetch_condition.pl line 22.
Couldn't get https://www.ncbi.nlm.nih.gov/clinvar/variation/rs112766203/:  at .\fetch_condition.pl line 22.

I appreciate your suggestions.

Thank you.

python clinvar perl • 462 views
ADD COMMENT
1
Entering edit mode

Don't forget to follow up on your threads. If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one answer if they all work. If an answer was not really helpful or did not work, provide detailed feedback so others know not to use that answer.

Upvote|Bookmark|Accept

ADD REPLY
0
Entering edit mode

Hii... i have updated the script in python..But, still getting no data found. Actually the data is there, i have print the parsed html file but not print the conditions_germline = soup.find('Conditions-Germline'). Is there problem with this line?? I have attached the script with this. Kindly have a look into this.Thank you.

>     import requests
>     from bs4 import BeautifulSoup
>     
>     
>     def search_clinvar_by_rsid(rsid):
>         url = f"https://www.ncbi.nlm.nih.gov/clinvar/?term={rsid}"
>         try:
>             response = requests.get(url)
>             if response.status_code == 200:
>                 soup = BeautifulSoup(response.content, 'html.parser')
>                 #print(soup)
>                 conditions_germline = soup.find('Conditions-Germline')
>                 print(conditions_germline)
>                 if conditions_germline:
>                     conditions_text = conditions_germline.text.strip()
>                     print(f"conditions_germline data for rsid '{rsid}':")
>                     print(conditions_text)
>                 else:
>                     print(f"No conditions_germline data found for rsid '{rsid}'")
>             else:
>                 print(f"Failed to retrieve search results for rsid '{rsid}'. Status code:", response.status_code)
>         except requests.RequestException as e:
>             print("Error:", e)
>     
>     
>     def read_rsids_from_file(filename):
>         with open(filename, 'r') as file:
>             return [line.strip() for line in file]
>     
>     rsids = read_rsids_from_file('rsids.txt')
>     for rsid in rsids:
>         search_clinvar_by_rsid(rsid)
ADD REPLY
1
Entering edit mode
9 days ago
$ cat input.rs

rs11203366
rs11203367
rs874881
rs776453694
rs80324279
rs324420
rs112766203

 wget -qO - "https://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh38/clinvar.vcf.gz" |\
   bcftools query -i "`cat input.rs  | sed 's/^rs//' | awk '{printf("%sRS=\x27%s\x27",(NR==1?"":" || "),$1);}'`" -f '%CHROM %POS %REF %ALT rs%RS %CLNDN\n'

1  17331039  G  A  rs11203366   Rheumatoid_arthritis|Abnormal_pulmonary_interstitial_morphology|PADI4-related_condition
1  17331121  T  C  rs11203367   Rheumatoid_arthritis|Abnormal_pulmonary_interstitial_morphology|PADI4-related_condition
1  17334004  G  C  rs874881     Rheumatoid_arthritis|Abnormal_pulmonary_interstitial_morphology|PADI4-related_condition
1  21838914  C  T  rs776453694  Schwartz-Jampel_syndrome_type_1|Inborn_genetic_diseases|Schwartz-Jampel_syndrome|Lethal_Kniest-like_syndrome
1  33013330  G  C  rs80324279   Reticular_dysgenesis
1  46405089  C  A  rs324420     FAAH-related_condition|FAAH_POLYMORPHISM|Polysubstance_abuse,_susceptibility_to
1  97305279  G  A  rs112766203  not_provided|Dihydropyrimidine_dehydrogenase_deficiency
ADD COMMENT
0
Entering edit mode

ah ! and your code doesn't work because clinvar ID is not a rs-ID.

ADD REPLY
0
Entering edit mode

so which one should i prefer? i don't need the entire results i need the condition-Germline table only. (for eg:-from, https://www.ncbi.nlm.nih.gov/clinvar/variation/294920/ need the condition-Germline table ) when i manually check with the rsid i got the result. I'm confused with this.Could you please make a clarification regarding this?

ADD REPLY

Login before adding your answer.

Traffic: 1955 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6