Bin merging

If EukCC is run in folder mode, it can try to merge two more more bins to create a refined/merged version of increased completeness.

For this you can and should pass paired read information to EukCC. So only bins linked by at least 100 (default) reads are considered for merging. This greatly improves speed and accuracy.

Preparing your linked reads

If you have paired-end read data you should create a sorted alignment. If you have multiple read files, you can create multiple BAM files.

For this you will need the contigs that were used to create this bins. Alternatively merge all bins into a pseudo-assembly file.

cat binfolder/*.fa > pseudo_contigs.fasta
bwa index pseudo_contigs.fasta
bwa mem -t 8 pseudo_contigs.fasta reads_1.fastq.gz reads_2.fastq.gz  |
    samtools view -q 20 -Sb - |
    samtools sort -@ 8 -O bam - -o alignment.bam
samtools index alignment.bam

You can then create a bin_linking table by using the EukCC provided script:

binlinks.py  --ANI 99 --within 1500 \
    --out linktable.csv binfolder alignment.bam

If you have multiple bam files, pass all of them to the script (e.g. *.bam).

You will obtain a three column file (bin_1,bin_2,links).

Merging bins

You can then launch EukCC on the same binfolder like so:

eukcc folder \
    --out outfolder \
    --threads 8  \
    --links linktable.csv \
    binfolder

EukCC will fist run on all bins individually. It will then identify medium quality bins that are at least 50% compelete but not yet more than 100-improve_percent. It will then identify bins that are linked by at least 100 paired end reads to these medium quality bins. If after merging the quality score goes up this bin will be merged.

Merged bins can be found in the output folder.

Warning

Meging more than two bins. So setting --n_combine to anything above 1 is experimental and not yet recommended. We had very good results with merging two bins.

Example Dataset

I created example data to test this based on the Lichen Study ERP123954. Bins were created using CONCOCT but any binner with no prokaryotic bias works.

wget ...
gunzip eukcc_example_folder_GT57.zip

# Use at least a couple threads to speed it up
eukcc folder --threads 6 \
    --out output \
    --links eukcc_example_folder_GT57/links.csv \
    --n_combine 1 \
    eukcc_example_folder_GT57/bins