Ok, anthrax might sound a bit scary, but this is a story about something that should make you feel good.

Wait, is that really Anthrax?

Back in the Spring of 2015, researchers studying the microbial ecology of the built environment generated a very large amount of genomic data from microbes found in the New York City subway system. To give you some idea of the magnitude, they generated 10.4 billion sequence reads across 1,457 samples. That's a lot of microbiome data.

The first pass over the data reported that there were two samples possibly containing a small amount of the bacterium that causes anthraxB. anthracis (0.06 - 0.25%). Now, B. anthracis is closely related to its relatively-harmless cousin B. cereus, and so there was a lot of discussion at the time that this was (hopefully) a false alarm that may have been caused by low-level instrument error.

Either way, the general consensus was that it needed a closer look.

A Targeted Detection Tool

Thinking about the NYC subway, we built a tool to detect B. anthracis and distinguish it from any other Bacillus species in a sample. This incorporated some genetic markers developed by Timothy Read's group at Emory (Petit III, 2015), as well as some markers we assembled to detect the total amount of all Bacillus species present.

Here's an example of what this detection tool looks like for a true B. anthracis sample:

You can see that the tool has a number of different panels. Each of those panels shows a different region of the B. anthracis genome. The panel at the bottom shows the abundance of all Bacilus species in the sample.

No, it's not Anthrax (phew)

You can read a full summary of how we built the tool here, but the main take-home is that while the NYC subway samples had a small number of reads that resembled B. anthracis, they had a large number of reads from other Bacillus species. Moreover, when the Mason lab re-sequenced those samples at a much higher depth for QC (dark orange point), while the number of putative B. anthracis reads did increase, the set of mutations found at at the lower depth did not increase proportionally - instead we observed a different set of "Ba-like" mutations at higher depth. This indicates the Ba-similar SNPs at lower depth may have been genotyping noise. In fact, the number of reads resembling B. anthracis in those samples (orange points in the figure below) was right in the middle of the range expected due to potential genotyping error (blue points) or related strains of B. cereus.

Detection of B. anthracis across 103 Bacillus isolates and metagenomic samples. Red points are Bacillus anthracis, blue points are other Bacillus species, purple points are synthetic metagenomic samples with B. anthracis spike-ins (0.1X – 2.0X), and orange points are 3 samples (2 sites) from the subway PathoMAP project (Afshinnekoo et al., 2015a). For display, values of 0 were rounded up to 1e-4.

Just for comparison, here is the result for one of the NYC Subway samples, where you can see the evidence for B. anthracis is vastly outweighed by the evidence for other Bacillus species (in the bottom panel):

So, in the end we are happy to report that our targeted tool, as well as the additional data and analysis from the Mason lab, supports the conclusion that this was indeed a false alarm. You can read more here from Prof. Mason here: microBEnet blog post.

Also, we're happy to say that anyone else can use this tool as well to check their samples. Any sample uploaded to One Codex with reads resembling B. anthracis will be automatically run through this tool.

Please feel free to get in touch if you have any questions.