Discrover

Discrover is a motif discovery method to find binding sites of nucleic acid binding proteins.

The corresponding publication is:
Binding site discovery from nucleic acid sequences by discriminative learning of hidden Markov models
Jonas Maaskola and Nikolaus Rajewsky
Nucleic Acid Research, 42(21):12995-13011, Dec 2014. doi:10.1093/nar/gku1083

Software

The software of Discrover is distributed under the GNU General Public License v3 and available on GitHub.
There you can retrieve the source code, and we provide binary packages for select Linux distributions, including Debian, Fedora, and Ubuntu.

Synthetic sequence data for motif discovery performance evaluation

Please refer to the Materials section of the publication for details on how the data were generated.

Sequence archives

There are three sets of experiments. We provide separate separate downloads for the experiments.

Archive structure

The downloads for the basic and 3'UTR experiments contain paired FASTA files for the signal and control sequences in the following directory structure:

EXPNAME/nAAA/lenBBB/pCCC_klDDD.signal.fa
EXPNAME/nAAA/lenBBB/pCCC_klDDD.control.fa

For the decoy experiments the structure is as follows:

EXPNAME/nAAA/lenBBB/pCCC_klDDD_rpEEE_rklFFF.signal.fa
EXPNAME/nAAA/lenBBB/pCCC_klDDD_rpEEE_rklFFF.decoy_control.fa

The meaning of the uppercase strings is as follows:

Code Meaning
EXPNAME Experiment name; PWM for the basic experiments and 3utr for the 3'UTR experiments, and decoy for the decoy experiments
AAA Number of sequences
BBB Length of sequences
CCC Signal motif implantation frequency
DDD Signal motif information content
EEE Control motif implantation frequency
FFF Control motif information content

Sequences

The sequences were generated as described in the publication. Background sequence is in lower case characters, implanted motifs are in upper case. This illustrated below:

>signal_0
cgttgtgcGCCACGCAaaag
>signal_1
taactttacGCCACCCActt
>signal_2
cacGCCACGGAggaggactc
>signal_3
TCCACGCAaatcaattcctt
>signal_4
atgtatgcgGTCACCGAgag

Contact

If you have questions regarding these data, please refer to the publication where you will find the email address of the corresponding author.