---
title: 'Binette: a fast and accurate bin refinement tool to construct high quality Metagenome Assembled Genomes.'
tags:
  - Python
  - Metagenomics
  - Binning
  - Bin refinement
  - MAGs

authors:
  - name: Jean Mainguy
    orcid: 0009-0006-9160-9744
    affiliation: "1, 2"
  - name: Claire Hoede
    orcid: 0000-0001-5054-7731
    affiliation: "1, 2"
    corresponding: true
affiliations:
 - name: Université de Toulouse, INRAE, BioinfOmics, GenoToul Bioinformatics facility, 31326, Castanet-Tolosan, France
   index: 1
 - name: Université de Toulouse, INRAE, UR 875 MIAT, 31326, Castanet-Tolosan, France
   index: 2
date: 30 november 2023
bibliography: paper.bib
---


# Statement of need
Metagenomics enables the study of microbial communities and their individual members through shotgun sequencing. An essential phase of metagenomic analysis is the recovery of metagenome-assembled genomes (MAGs). MAGs serve as a gateway to additional analyses, including the exploration of organism-specific metabolic pathways, and form the basis for comprehensive large-scale metagenomic surveys [@Nayfach2019global_human_gut_microbiome;@Acinas_Sánchez_et_al_2021]. 

In a metagenomic analysis, sequence reads are first assembled into longer sequences called contigs. These contigs are then grouped into bins based on common characteristics in a process called binning to obtain MAGs. There are several tools that can be used to bin contigs into MAGs. These tools are based on various statistical and machine learning methods and use contig characteristics such as tetranucleotide frequencies, GC content and similar abundances across samples [@kang2019metabat;@alneberg2014concoct;@nissen2021improved]. 

The approach of applying multiple binning methods and combining them has proven useful to obtain more and better quality MAGs from metagenomic datasets.This combination process is called bin-refinement and several tools exist to perform such tasks, such as DASTool [@sieber2018dastool], MagScot [@ruhlemann2022magscot] and the bin-refinement module of the metaWRAP pipeline [@uritskiy2018metawrap]. Of these, metaWRAP's bin-refinement tool has demonstrated remarkable efficiency in benchmark analysis [@meyer2022critical]. However, it has certain limitations, most notably its inability to integrate more than three binning results. In addition, it repeatedly uses CheckM  [@parks2015checkm] to assess bin quality throughout its execution, which contributes to its slower performance. Furthermore, since it is embedded in a larger framework, it may present challenges when attempting to integrate it into an independent analysis pipeline.

We present Binette, a bin refinement tool inspired by metaWRAP's bin refinement module, which addresses the limitations of the latter and ensures better results.

# Summary
Binette is a Python reimplementation and enhanced version of the bin refinement module used in metaWRAP. It takes as input sets of bins generated by various binning tools. Using these input bin sets, Binette constructs new hybrid bins using basic set operations. Specifically, a bin can be defined as a set of contigs, and when two or more bins share at least one contig, Binette generates new bins based on their intersection, difference, and union (\autoref{fig:overview}.A). This approach differs from metaWRAP, which exclusively generates hybrid bins based on bin intersections and allows Binette to expand the range of possible bins.


![**Overview of Binette Steps**. **(A) Intermediate Bin Creation Example**: Bins are represented as square shapes, each containing colored lines representing the contigs they contain. Creation of intermediate bins involves the initial bins sharing at least one contig. Set operations are applied to the contigs within the bins to generate these intermediate bins. **(B) Binette Workflow Overview**: Input bins serve as the basis for generating intermediate bins. Each bin undergoes a scoring process utilizing quality metrics provided by CheckM2. Subsequently, the bins are sorted based on their scores, and a selection process is executed to retain non-redundant bins.\label{fig:overview}](./binette_overview.pdf)


Bin completeness and contamination are assessed using CheckM2 [@chklovski2023checkm2]. Bins are scored using the following scoring function: $completeness - weight * contamination$, with the default weight set to 2. These scored bins are then sorted, facilitating the selection of a final new set of non-redundant bins (\autoref{fig:overview}.B). The ability to score bins is based on CheckM2 rather than CheckM1 as in the metaWRAP pipeline. CheckM2 uses a novel approach to evaluate bin quality based on machine learning techniques. This approach improves speed and also provides better results than CheckM1. Binette initiates CheckM2 processing by running its initial steps once for all contigs within the input bins. These initial steps involve gene prediction using Prodigal and alignment against the CheckM2 database using Diamond [@buchfink2015diamond]. Binette uses Pyrodigal [@larralde2022pyrodigal], a Python module that uses Cython to provide bindings to Prodigal [@hyatt2010prodigal]. The intermediate Checkm2 results are then used to assess the quality of individual bins, eliminating redundant calculations and speeding up the refinement process.

Binette serves as the bin refinement tool within the [metagWGS](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs) metagenomic analysis pipeline [@metagWGS_inprep], providing a robust and faster alternative to the bin refinement module of the metaWRAP pipeline as well as other similar bin refinement tools.

# Availability

Binette is readily available on [PyPI](https://pypi.org/project/Binette/) for seamless installation using standard Python package management tools. Additionally, a dedicated Conda package is available in the Bioconda channel [@gruning2018bioconda]. The source code for Binette is available on [GitHub](https://github.com/genotoul-bioinfo/binette) under the MIT license. The GitHub repository includes continuous integration tests, test coverage, and employs continuous deployment through GitHub actions to maintain a robust and reliable codebase.


# Acknowledgements

We would like to thank Matthias Zytnicki for his valuable insights and support during the development of the binette algorithm.


# References
