- Converting FASTQ file to FASTA file
- Convert SAM file to BAM
- Sort BAM file
- Index BAM file
- Get subset of sequence from FASTA file
- Get particular record from multi-FASTA file
- Filter records based on the sequence length in FASTQ
- Local BLAST output format options
- Removing new lines from multi-FASTA file
- Filtering reads over 11Kb in length
- Removing duplicate lines
- Create a histogram of list of numbers
- Convert lowercase FASTA records to uppercase
- Compressing and indexing VCF file
- Sorting a VCF file based on chromosome and position
- Count the number of reads in FASTQ file
- Docker post-installation steps
sed -n '1~4s/^@/>/p;2~4p' input.fastq > output.fasta
samtools view -b input.sam > output.bam
samtools sort input.bam > output.sorted.bam
samtools index input.sorted.bam
NOTE: This generates input.sorted.bam.bai file.
awk -v start=$start -v end=$end -v name="name_here" '$0~name{getline seq; print substr(seq,start,end-start)}' input_sequence.fasta
NOTE: change values for start and end accordingly.
awk '/^>contig_1$/ {print;getline;print}' multi.fasta
NOTE: change contig_1 accordingly.
awk 'BEGIN {FS = "\t"; OFS = "\n"} {header = $0; getline seq; getline qheader; getline qseq; if (length(seq)) >= 11000) { print header,seq,qheader,qseq}}' < input.fastq > filtered.fastq
Syntax for blastn
: blastn -db {db_name} -query {query.fasta} -out {output_file} -outfmt {output_format} -num_threads {num_threads}
OUTPUT FORMAT
Alignment View Optiosn:
0 = pairwise
1 = query-anchored showing identities
2 = query-anchored no identities
3 = flat query-anchored, show identities
4 = flat query-anchored, no identities
5 - XML Blast output
6 - tabular
7 = tabular with comment lines
8 = Text ASN.1
9 = Binary ASN.1
10 - Comma-separated values
11 = BLAST archive format (ASN.1)
Options 6, 7, and 10 can be additionally configured to produce a custom format
specified by space delimited format specifiers. The supported format
specifiers are:
qseqid means Query Seq-id
qgi means Query GI
qacc means Query accesion
qaccver means Query accesion.version
qlen means Query sequence length
sseqid means Subject Seq-id
sallseqid means All subject Seq-id(s), separated by a ';'
sgi means Subject GI
sallgi means All subject GIs
sacc means Subject accession
saccver means Subject accession.version
sallacc means All subject accessions
slen means Subject sequence length
qstart means Start of alignment in query
qend means End of alignment in query
sstart means Start of alignment in subject
send means End of alignment in subject
qseq means Aligned part of query sequence
sseq means Aligned part of subject sequence
evalue means Expect value
bitscore means Bit score
score means Raw score
length means Alignment length
pident means Percentage of identical matches
nident means Number of identical matches
mismatch means Number of mismatches
positive means Number of positive-scoring matches
gapopen means Number of gap openings
gaps means Total number of gaps
ppos means Percentage of positive-scoring matches
frames means Query and subject frames separated by a '/'
qframe means Query frame
sframe means Subject frame
btop means Blast traceback operations (BTOP)
staxids means Subject Taxonomy ID(s), separated by a ';'
sscinames means Subject Scientific Name(s), separated by a ';'
scomnames means Subject Common Name(s), separated by a ';'
sblastnames means Subject Blast Name(s), separated by a ';'
(in alphabetical order)
sskingdoms means Subject Super Kingdom(s), separated by a ';'
(in alphabetical order)
stitle means Subject Title
salltitles means All Subject Title(s), separated by a '<>'
sstrand means Subject Strand
qcovs means Query Coverage Per Subject
qcovhsp means Query Coverage Per HSP
awk '/^[>;]/ { if (seq) { print seq }; seq=""; print } /^[^>;]/ { seq = seq $0 } END { print seq }' input_file.fasta > outputfile.fasta
awk 'BEGIN {FS = "\t" ; OFS = "\n"} {header = $0 ; getline seq ; getline qheader ; getline qseq ; if (length(seq) >= 11000) {print header,seq,qheader,qseeq}}' < input.fastq > output.fastq
awk !x[$1]++ file > output_file
awk -v size=20 '{ b=int($1/size); a[b]++; bmax=b>bmax?b:bmax; bmin=b<bmin?b:bmin } END { for(i=bmin;i<=bmax;++i) print i*size,(i+1)*size,a[i] } <file>
NOTE: change bin size accordingly.
awk 'BEGIN{FS=" "}{if(!/>/){print toupper($0)}else{print $1}}' input.fna > output.fna
bgzip -c file.vcf > file.vcf.gz
tabix -p vcf file.vcf.gz
sort -k1,1V -k2,2n input.vcf > output.vcf
The -k1,1V
option tells sort
to sort by the first column, using "version" sort, which is natural sort of (version) numbers within text
echo "$(( $(wc -l < your_file.fastq) / 4 ))"
- Create a
docker
group
sudo groupadd docker
- Add your user to the
docker
group
sudo usermod -aG docker $USER
- Log out and log back in or activate the changes to groups by running:
newgrp docker