A quick and somewhat untested pipeline to run cutadapt.
This will take paired reads corresponding to a single genome (already demultiplexed) and perform quality filtering and adapter trimming.
input (2 files)
- forward reads
- reverse reads
output:
- forward reads with no adapter found
tmp/{strain}/{strain}.trim_adapter.f.fq
- reverse reads with no adapter found
tmp/{strain}/{strain}.trim_adapter.r.fq
- forward reads where adpater was found or quality was low
tmp/{strain}/{strain}.trim_adapter.f.discard.fq
- reverse reads where adpater was found or quality was low
tmp/{strain}/{strain}.trim_adapter.r.discard.fq
- (inscrutable) log file
tmp/{strain}/{strain}.trim_adapter.log.txt
###Notes:
- I added
XXXX
to the end of the adapters we were trimming to avoid internal matches. See http://cutadapt.readthedocs.io/en/stable/recipes.html#avoid-internal-adapter-matches - I used
--untrimmed-output
and--untrimmed-paired-output
to get all the reads that had no adapter found in a separate file. I did this for debugging purposes. The default, if you leave those off, is to put the trimmed reads and reads that were fine without trimming in the same file. So the file I call "discard" would actually be the file you keep, it'd contain reads where the adapter was removed and untouched reads.
folder structure:
input
|
-> {strain}
|
-> {strain}.raw_reads.f.fq
Snakefile
config.yaml