Retrieving specific columns from clinicaltrial.gov using R
1
0
Entering edit mode
8.9 years ago
niveditaj20 ▴ 50

I have been trying to pull specific columns for a disease using "rclinicaltrial" package, but unable to do so.

Kindly need your suggestions.

I've tried using this code

melanom <- clinicaltrials_search(query = c("cond=melanoma", "phase=2", "type=Intr", "rslt=With"), count=5)

e.g., I want data for melanoma from clinicaltrial.gov, with specific columns like-

  • Title
  • Phase
  • Recruiting status and
  • Compound

but it gives me a default output like this, with all the default columns-

score     nct_id        url                                          title                         status.text                  condition_summary                 intervention_summary                        last_changed
0.99384   NCT01723800   http://ClinicalTrials.gov/show/NCT01723800   PI3K Inhibitor BKM120         Active, not recruiting       Adenocarcinoma of the             Drug: PI3K inhibitor BKM120;                May 5, 2015
                                                                     Carboplatin, and                                           Lung; Bronchoalveolar             Drug: pemetrexed disodium;
                                                                     Pemetrexed Disodium in                                     Cell Lung Cancer;                 Drug: carboplatin;
                                                                     Treating Patients With                                     Large Cell Lung Cancer;           Other: laboratory biomarker analysis;
                                                                     Stage IV Non-Small Cell                                    Recurrent Non-small Cell          Other: pharmacological study;
                                                                     Lung Cancer                                                Lung Cancer; Stage IV             Procedure: quality-of-life assessment
                                                                                                                                Non-small Cell Lung Cancer

Thanks a lot it works!.

I've one more doubt,

Is it possible to write the two data frames together? since they've differing number of rows it is not being possible.. How do I write it?

R • 2.0k views
ADD COMMENT
3
Entering edit mode
8.9 years ago
Steven Lakin ★ 1.8k

You want to use the clinicaltrials_download() function once you have identified the studies you want to use. For example, here is a walk-through of what you need to do:

# How many melanoma entries are in the database?
clinicaltrials_count(query="melanoma")
[1] 1737

# Download all information about 10 of these trials
melanom <- clinicaltrials_download(query="melanoma", count=10, include_results=TRUE)
str(melanom)

List of 2
 $ study_information:List of 6
  ..$ study_info   :'data.frame':    10 obs. of  28 variables:
  .. ..$ org_study_id                      : chr [1:10] "NEI-23" "000001" "920105" ...
  .. ..$ nct_id                            : chr [1:10] "NCT00000124" "NCT00001144"...
  .. ..$ brief_title                       : chr [1:10] "Collaborative Ocular ...
...

# Subset the data according to the parameters you want
melanom_subset <- list(
    data.frame(
        melanom$study_information$study_info$brief_title,
        melanom$study_information$study_info$phase,
        melanom$study_information$study_info$overall_status
    ),
data.frame(
        melanom$study_information$interventions$intervention_type,
        melanom$study_information$interventions$intervention_name
    )
)

str(melanom_subset)

List of 2
 $ :'data.frame':    10 obs. of  3 variables:
  ..$ melanom.study_information.study_info.brief_title   : Factor w/ 10 levels ...
  ..$ melanom.study_information.study_info.phase         : Factor w/ 4 levels "N/A" ...
  ..$ melanom.study_information.study_info.overall_status: Factor w/ 3 levels ...
 $ :'data.frame':    14 obs. of  2 variables:
  ..$ melanom.study_information.interventions.intervention_type: Factor ...
  ..$ melanom.study_information.interventions.intervention_name: Factor ...
# Format the new list of dataframes
names(melanom_subset[[1]]) <- c("Title", "Phase", "Recruitment")
names(melanom_subset[[2]]) <- c("InterventionType", "InterventionName")
head(melanom_subset)

And now you have a list of two data frames, the first being the Title, Phase, and Recruitment status of each trial, and the second data frame being the interventions used. You can subset the larger table to get whatever information you'd like from it.

If you'd like to query with more advanced parameters, try using these:

head(advanced_search_terms)

     keys   description                                                   help
term term  Search Terms        http://clinicaltrials.gov/ct2/help/search_terms
recr recr   Recruitment         http://clinicaltrials.gov/ct2/help/recruitment
rslt rslt Study Results       http://clinicaltrials.gov/ct2/help/study_results
type type    Study Type          http://clinicaltrials.gov/ct2/help/study_type
cond cond    Conditions    http://clinicaltrials.gov/ct2/help/conditions_instr
intr intr Interventions http://clinicaltrials.gov/ct2/help/interventions_instr
ADD COMMENT
1
Entering edit mode

Thanks a lot it works!.

I've one more doubt,

Is it possible to write the two data frames together? since they've differing number of rows it is not being possible.. How do I write it?

ADD REPLY
0
Entering edit mode

I'm happy it helped; could you click the "accept answer" checkbox next to my answer? Thanks.

I wasn't sure about the data frame size either; my guess is that one or more of those trials used more than one intervention (e.g. one trial used two or three different interventions), so the sizes of the dataframes aren't equal. You could force them to be merged, but you'd have to add "NA"s for some of the rows, which wouldn't be very meaningful. If you still want to force them into one data frame, copy and paste the following into R (assuming you used the names that I used before):

cbind.fill <- function(...){
    nm <- list(...)
    nm <- lapply(nm, as.matrix)
    n <- max(sapply(nm, nrow))
    do.call(cbind, lapply(nm, function (x)
        rbind(x, matrix(, n-nrow(x), ncol(x)))))
}

mergedData <- cbind.fill(melanom_subset[[1]], melanom_subset[[2]])
ADD REPLY

Login before adding your answer.

Traffic: 1709 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6