7 Download GEO data
7.1 Download SRA files
- Download latest SRA-tool
wget --output-document sratoolkit.tar.gz\
http://ftp-trace.ncbi.nlm.nih.gov/sra/sdk/current/sratoolkit.current-ubuntu64.tar.gz
sratoolkit.tar.gz
file will be downloaded.
tar -xvzf sratoolkit.tar.gz
export PATH=$PATH:/home/basu/sratoolkit.3.0.1-ubuntu64/bin
- Check whether the tool is working or not by executing the following command
prefetch -h
Now let us download the following data from GEO. Transcriptional profiling of adult retinal ganglion cells during optic nerve regeneration [GSE142881: RGC injury dataset]
Create a script file with vim editor
download.sh
and paste the following script into that.
vim download.sh
Note
vim download.sh → insert (press I in your keyboard) → esc → :wq → chmod +x download.sh
Bash Script
#!/bin/sh
# Downloading the SRA files using Prefetch command
for i in $(seq 10821165 10821205)
do
prefetch -v SRR$i --max-size 50G
done
chmod +x download.sh
- Execute the script by typing
nohup bash download.sh
7.2 Convert .sra file to .fastq files
mkdir fastq/
- Make another script
loop.sh
to convert .sra file to .fastq file
vim loop.sh
Bash script
#!/bin/sh
# convert using fasterq-dump command
for i in $(seq 10821165 10821205)
do
fasterq-dump --split-files --skip-technical -e 16 SRR$i --outdir fastq
done
chmod +x loop.sh
nohup bash loop.sh
7.3 Make gzip compression of the fastq files
cd fastq/
gzip *.fastq
Note
The next step would be checking the quality of the fastq files. See Chapter 4 for the details of fastqc commands and quality check. I would recommend to refer to that section before proceeding to quality control of fastq data.
- In the next section we are going to install packages and dependencies.