Converting Python Code To R
3
0
Entering edit mode
11.0 years ago
ruchiksy ▴ 50

Hello,

I am trying to convert the Python code of RDXplorer to R to make the pre-processing more easier and efficient.

Here is the code in Python:

#!/usr/bin/env python


import pileup as plp
import v6
import globals as glob
import file_utils as fu
import AnalyzeSequence as ans
import os
import sys
import rpy2.robjects as ro
import rpy2.robjects.numpy2ri
import rdxplorer_api as rdxp

if int(len(sys.argv)) < 12:
    rdxp.usage()

else:
    path2bam=sys.argv[1]
    reference=sys.argv[2]
    wrkgdir=sys.argv[3]
    chromOfInterest=sys.argv[4]

    gender=sys.argv[5]
    hg=sys.argv[6]
    winSize=sys.argv[7]
    baseCopy=sys.argv[8]
    filter=sys.argv[9]
    sumWithZero=sys.argv[10]
    debug=sys.argv[11]
    delete=sys.argv[12]

    debug=fu.str2bool(debug)
    delete=fu.str2bool(delete)
    sumWithZero=fu.str2bool(sumWithZero)
    baseCopy=int(baseCopy)
    winSize=int(winSize)
    filter=int(filter)

    if rdxp.complainAndBail() == True:
        if debug==True:
            print("The following arguments have been accepted:")
            a=0
            for arg in sys.argv:
                if a==0:
                    print("Program: " + arg)
                elif a==1:
                    print("Bam file name: " + arg)
                else:
                    print ('\t' + arg)
                a=a+1

        accepted_chromosomes = ["1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20","21","22", "X", "Y"]
        if fu.find_first_index(accepted_chromosomes, chromOfInterest) < 0:
            chromOfInterest = 'All'

I have not done any programming in R or Python, so I am wondering how should I go about starting the conversion?

Any tips/help would be mcuh appreciated.

Thanks

python r code conversion • 27k views
ADD COMMENT
6
Entering edit mode

You don't know either language. But think in R it would be more efficient? I would close this question.

ADD REPLY
1
Entering edit mode

Except for huge vectors and matrices, R is far slower than python.

ADD REPLY
1
Entering edit mode

Programming in R is a mess, compared with Python (which is a mess in its own Pythonic way). What problem are you really trying to solve?

ADD REPLY
12
Entering edit mode
11.0 years ago

Well realistically the first step would be to learn R.

After all if you havent' done any programming with that language what good would it do to convert it to R?

There is a lot more to data analysis than having a single converted script.

ADD COMMENT
7
Entering edit mode
11.0 years ago
KCC ★ 4.1k

Taking a quick look at this code and the website for RDXplorer, I think you are not correct in your assumption that the pre-processing would be faster in R. Here is my anecdotal argument: this code is hybrid R/Python code. In the code you shared, it imports this module:

import rpy2.robjects

The package rpy2 is an adaptation for using R code in python. So, some thought was given by the original programmers as to what to implement in python and what to implement in R. I think a good rule of thumb is you should at least match the level of sophistication of the original programmers, before trying to second guess them. I hope that doesn't sound too condescending, but you did mention that you didn't know either R or python. So, I hope it seems fair to you that I would question whether you can make the right decisions about what parts were best implemented in R vs. python. (I don't feel I could second guess them either. I would need to spend a few weeks reading through their code until I felt like I knew how all the parts worked and could make general guesses about why certain parts were implemented in R vs python.)

Why did you pick this script in particular? It seems like there must be a lot of python code in this package. The website lists that they also use scipy and numpy.

I should say that for simple scripts often you can do a line, by line conversion. For reasonably complicated programs (which this one seems to be), you would probably need to re-design the whole program, probably putting in a similar level of effort to what it took to write the original program in the first place.

ADD COMMENT
4
Entering edit mode
11.0 years ago
Michael 54k

I don't see the computation in this script, it is mainly parameter handling. In fact it seems like the part of your script that does something non-trivial is missing. So, as your script does nothing, that is very simple to convert to R ;)

ADD COMMENT
1
Entering edit mode

Hmmm. I assumed that the packages were not loaded for no reason and that ruchiksy just hadn't pasted the whole script. But I guess you right, this could be translated to R relatively easily ie. complain about parameters if they are not the right number, otherwise store some parameters and do nothing with them. Good catch! +1

ADD REPLY

Login before adding your answer.

Traffic: 2662 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6