Rapid Real-time Squiggle Classification for Read until using RawMap
Author(s): Harisankar Sadasivan, Jack Wadden, Kush Goliya, Piyush Ranjan, Robert P. Dickson, David Blaauw, Reetuparna Das and Satish Narayanasamy
ReadUntil enables Oxford Nanopore Technology’s (ONT) sequencers to selectively sequence reads of target species in real-time. This enables efficient microbial enrichment for applications such as microbial abundance estimation and is particularly beneficial for metagenomic samples with a very high fraction of non-target reads (> 99% can be human reads). However, read-until requires a fast and accurate software filter that analyzes a short prefix of a read and determines if it belongs to a microbe of interest (target) or not. The baseline Read Until pipeline uses a deep neural network-based basecaller called Guppy and is slow and inaccurate for this task (~60% of bases sequenced are unclassified). We present RawMap, an efficient CPUonly microbial species-agnostic Read Until classifier for filtering non-target human reads in the squiggle space. RawMap uses a Support Vector Machine (SVM), which is trained to distinguish human from microbe using non-linear and non-stationary characteristics of ONT’s squiggle output (continuous electrical signals). Compared to the baseline Read Until pipeline, RawMap is a 1327X faster classifier and significantly improves the sequencing time and cost, and compute time savings. We show that RawMap augmented pipelines reduce sequencing time and cost by ~24% and computing cost by 22%. Additionally, since RawMap is agnostic to microbial species, it can also classify microbial species it is not trained on. We also discuss how RawMap may be used as an alternative to the RT-PCR test for viral load quantification of SARS-CoV-2.