Document

Introduction

I. What is ncRNADrug?

ncRNADrug provides a centralized resource for drug and ncRNA associations. It includes experimentally validated and computationally predicted ncRNAs associated with drug resistance, as well as ncRNAs targeted by drugs. Additionally, it offers potential drug combinations for the treatment of resistant cancer.

II. Why to create ncRNADrug database?

Drug resistance is a major barrier in cancer treatment and anticancer drug development. Growing evidences indicate that non-coding RNAs (ncRNAs), especially microRNAs (miRNAs), long non-coding RNAs (lncRNAs) and circular RNAs (circRNAs), play pivotal roles in cancer progression, therapy, and drug resistance. Furthermore, ncRNAs have been proved to be promising novel therapeutic targets for cancer treatment. Reversing dysregulated ncRNAs using small molecular drugs holds significant potential as an effective therapeutic strategy for overcoming drug resistance in cancer.

Due to the importance of ncRNAs in regulating drug resistance, multiple databases collecting associations between ncRNAs and drugs have been developed, including our previously developed SM2miR, D-lnc and ncDR, and others, such as miREnvironment and NoncoRNA. However, these databases tend to focus primarily on one type of association, drug target or drug response. There is no database that systematically integrates these two types of associations with up-to-date data. Therefore, it is necessary and highly desirable to construct a centralized resource of ncRNAs associated with drug resistance, ncRNAs targeted by drugs, and potential drug combinations for the treatment of resistant cancer. To fill this gap, we developed ncRNADrug, which collected curated and predicted associations of ncRNAs and drugs.

III. What does ncRNADrug database contain?

ncRNADrug provides a user-friendly, open access web interface for searching, browsing and downloading data. So far, in terms of experimentally validated entries, ncRNADrug contains 29551 entries involving 9195 ncRNAs (2248 miRNAs, 4145 lncRNAs and 2802 circRNAs) associated with the drug resistance of 266 drugs, and 32969 entries involving 10480 ncRNAs (4338 miRNAs, 6087 lncRNAs and 55 circRNAs) targeted by 965 drugs. In terms of predicted entries, ncRNADrug contains 624246 entries involving 134201 ncRNAs (3601 miRNAs, 32892 lncRNAs and 97708 circRNAs) associated with the drug resistance of 5588 drugs, and 285100 predicted entries involving 61602 ncRNAs (5423 miRNAs, 36814 lncRNAs and 19365 circRNAs) targeted by 1303 drugs.

In terms of experimentally validated entries from more than 9000 published papers, the detailed information including ncRNA information (name, ID, type), drug information (name, DrugBank ID,Pubchem CID, FDA approved or not), pattern (up-/down-regulated or resistant/sensitive), ncRNA target and pathway, experimental technique (e.g., qRT-PCR, microarray, RNA-seq), confidence of experiment (low-throughput, high-throughput), species, experimental sample (cell line and/or tissue), phenotype, evidence and reference (PubMed ID, title, published year). In addition, the information in our SM2miR, D-lnc and ncDR were also integrated into the ncRNADrug database. The information about ncRNAs, drugs and phenotypes was further standardized. miRNAs were mapped to miRBase (miRbase Accession). lncRNAs were mapped to Ensembl (Ensembl Gene ID) and NONCODE (NONCODE GENE ID). circRNAs were mapped to circBase (circRNA ID). Drugs were mapped to DrugBank (DrugBank Accession Number), PubChem (PubChem CID) and DTP/NCI (NSC Number). Cancer names were unified as the definition in TCGA.

Data processing

I. Data processing for prediction of ncRNAs associated with drug response

GEO

(1). Obtain dataset

We searched all series in the GEO database using the following combination of keywords: ('drug resistance' OR 'drug sensitive' OR 'drug response') AND ('miRNA' OR 'lncRNA' OR 'circRNA'). Filter criteria as follows:

Study type: 'Non-coding RNA profiling by array', 'Non-coding RNA profiling by genome tiling array' and 'Non-coding RNA profiling by high throughput sequencing'.
Species: 'Homo sapiens', 'Rattus norvegicus' and 'Mus musculus'.

(2). Data Preprocess

Remove probes without ncRNA names. For lncRNA series that do not have GPL annotation, we mapped the probes to the human genome (GRCh38.p13) by the SeqMap tool (1.0.13), and used GENCODE (Release 43) to determine lncRNA genes. If one probe corresponds to multiple ncRNAs, it will be directly abandoned. If an ncRNA has multiple probes, take the average of the expression values of all probes.

(3). Differential expression analysis

For series without biological repeats, calculate the fold change directly by resistant/sensitive;
RNA-seq data with raw count are analyzed by DESeq2;
RNA-seq data with normalized data (like TPM, FPKM) are analyzed by Limma.

The threshold of significantly differentially expressed ncRNAs: p-value ＜ 0.05 and |log2(fold change)| ＞ 1.

NCI60

Cancer cell line data. The normalized IC50 values (defined as compound concentrations that were required for 50% growth inhibition) across 60 cancer cell lines for 20218 compounds were obtained from the CellMiner database. Meanwhile, the expression levels of 260 miRNAs of 60 cancer cell lines were acquired and analyzed.

Resistant and sensitive cell lines. For each compound, cell lines with at least 0.8*standard deviation (SD) above the mean normalized IC50 value were defined as resistant to the compound, whereas those with at least 0.8*SD below the mean normalized IC50 value were regarded as sensitive.

Predicted drug resistance-associated miRNAs. For each compound, the significantly differentially expressed miRNAs between the resistant and sensitive cell lines were filtered as drug resistance-related miRNAs, which were computed with the T-test (p-value ＜ 0.05 and |log2(fold change)| ＞ 1).

GDSC and TANRIC

(1). Cancer cell line data

The drug response (totally including 135 compounds across 707 cell lines) and lncRNA expression profiles (measuring the RPKM values of 12727 lncRNAs in 739 cell lines across 20 tumor types) were gained from GDSC and TANRIC (RNA-seq data from CCLE) project, respectively.

(2). Resistant and sensitive cell lines. For each compound, cell lines with at least 0.8*standard deviation above the mean normalized IC50 value were defined as resistant to the compound, whereas those with at least 0.8*SD below the mean normalized IC50 value were regarded as sensitive.

(3). Predicted drug resistance-associated miRNAs

Predicted drug resistance-associated lncRNAs. For one drug in a specific cancer type, if it comprises only one cell in sensitive or resistant class, we applied 'fold change' method to measure the extent of association between drug resistance and lncRNAs (|log2(fold change)| ＞ 1). Apart from this condition, T-test was used to screen the differentially expressed lncRNAs based on RPKM values between the resistant and sensitive cell lines (p-value ＜ 0.05 and |log2(fold change)| ＞ 1).

II. Data processing for prediction of ncRNAs targeted by drug

GEO

(1). Obtain drug-perturbed dataset

Obtain dataset. We searched all series in the GEO database using the following combination of keywords: ('drug' OR 'small molecule' OR 'compound') AND ('miRNA' OR 'lncRNA' OR 'circRNA'). Filter criteria as follows:

Study type: 'Non-coding RNA profiling by array', 'Non-coding RNA profiling by genome tiling array' and 'Non-coding RNA profiling by high throughput sequencing'.
Species: 'Homo sapiens', 'Rattus norvegicus' and 'Mus musculus'.

(2). Data Preprocess

(3). Differential expression analysis

For series without biological repeats, calculate the foldchange directly by treat/control;

For series with biological repeats:

RT-PCR and microarray data are analyzed by Limma;
RNA-seq data with raw count are analyzed by DESeq2;
RNA-seq data with normalized data (like TPM, FPKM) are analyzed by Limma.

The threshold of significantly differentially expressed ncRNAs: p-value ＜ 0.05 and |log2(fold change)| ＞ 1.

CMap

We retrieved 6100 drug-perturbed gene expression datasets from CMap. Then we re-annotated the probes to lncRNAs. The differentially expressed lncRNAs between drug-treated samples and control samples were considered as drug-affected lncRNAs. Here, two-fold change was used for the identification of DE lncRNAs.