Welcome to Big Data Tool - BigFiRSt

If you find something helpful or somewhere can be improved, please contact us

Features

Data download

All the data in our database and the source code of BigFiRSt are available for download on the download page.

Entry page

Our entry page gives users a more comprehensive understanding of BigFiRSt and how to quickly learn to use BigFiRSt.

Long term maintenance

BigFiRSt is constantly being maintained and updated.

Friendly

A lot of flow charts or text are provided in the help page to tell users how to use the webserver and BigFiRSt.

Job access

The entries can be searched through two different methods, including Job ID search and Email search.

Concise interface

There is no useless buttons or anything which would confuse you.

Introduction

A big data tool for mining microsatellites from high-throughput DNA sequencing data.

General Introduction

In order to facilitate users to merge read pairs and identify SSRs (Simple Sequence Repeats) for small datasets, we provide this free webserver. To the best of our knowledge, there is no such web server currently available in the research community. Further, for handling very large data, users can download the source code of BigFiRSt (Big data based Flash and pErf algorithm for mining Ssr) here.

FLASH (Fast Length Adjustment of SHort reads) is an accurate and fast tool to merge paired-end reads that were generated from DNA fragments whose lengths are shorter than twice the length of reads. Merged read pairs result in unpaired longer reads, which are generally more desired in genome assembly and genome analysis processes.
PERF is a Python package developed for fast and accurate identification of microsatellites from DNA sequences. Microsatellites or SSRs are short tandem repeats of 1-6nt motifs. They are present in all genomes, and have a wide range of uses and functional roles. The existing tools for SSR identification have one or more caveats in terms of speed, comprehensiveness, accuracy, ease-of-use, flexibility and memory usage. PERF was designed to address all these problems.
The module integrates FLASH with PERF into a pipeline. This pipeline enables the users to use short read pairs as the input and get mined SSRs return.
Next-generation sequencing (NGS) techniques, such as the Illumina platform, produce very large numbers of short read pairs each runtime. While traditional stand-alone tools face challenges in merging short read pairs and identifying SSRs in downstream analyses from such large-scale data. BigFiRSt, a new Hadoop based program suite, address this problem through cutting-edge big data technology. BigFiRSt consists of two major modules, BigFLASH and BigPERF, which are implemented based on two state-of-the-art stand-alone tools, FLASH and PERF, respectively. They address the problem of merging short read pairs and mining SSRs in the big data manner, respectively. BigFiRSt not only allows users to use BigFLASH and BigPERF separately, but also provide a pipeline function to run them consecutively.