Research projects

Feature-Response Curve (FRC): Assembly Metric and Analysis Tool

Inspired by the standard receiver operating characteristic (ROC) curve, the Feature-Response curve characterizes the sensitivity (coverage) of the sequence assembler output (contigs) as a function of its discrimination threshold (number of features/errors). Each contig is assigned a number of features that correspond to doubtful regions of the sequence. Given any such set of features, the response (quality) of the assembler output is then analyzed as a function of the maximum number of possible errors (features) allowed in the contigs.

Properties:
  • The FRC can be used as a metric to compare the assembly quality of multiple assemblers.
  • The FRC does not require any reference sequence (except an estimate of the genome size) to be used for validation, thus making it a very useful tool in de novo sequencing projects.
  • Separate FRCs can be generated for each feature type enabling to scrutinize the relative strengths and weaknesses of different assemblers.
Resources:

SUTTA: Scoring-and-Unfolding Trimmed Tree Assembler

SUTTA is a new sequence assembly algorithm based on global search-methods (e.g. branch-and-bound or beam search). Some of its features are:
  • Technologically Agnostic: supports different set of technologies with minimal changes to its architecture (currently long Sanger reads and short next-generation Illumina reads).
  • Search strategy: each contig is assembled independently and dynamically without creating in advance the graph that describes the overlapping relations between all the reads;
  • Score-based: score functions are used to evaluate the DNA sequences concurrently while being assembled. The functions combine different structural properties (e.g., transitivity, coverage, mated pairs, physical maps, etc).
Resources:
  • Project web-site at NYU Bioinformatics Lab.
Press:
  • Featured in the Bioinformatics for Next Generation Sequencing virtual issue.

PLAN C: Planning with Large Agent-Networks against Catastrophes

PLAN C is an innovative tool for emergency managers, urban planners and public health officials to prepare and evaluate Pareto-optimal plans to respond to urban catastrophic situations. PLAN C was designed and developed at the NYU Bioinformatics Group for the Large-Scale Emergency Readiness project (LaSER), as part of the NYU Center for Catastrophe Preparedness & Response (CCPR).

Press:
Resources:

I-PAES: Immune Pareto Archived Evolution Strategy

I-PAES is a modified version of the multi-objective evolutionary algorithm PAES (Pareto Archived Evolution Strategy), proposed by Knowles and Corne in 1999, with a different solution representation (polypeptide chain) and immune inspired operators (cloning and hypermutation) for tackling the Protein Structure Prediction (PSP) as a Multi-Objective Optimization Problem (MOOP).

A recent review paper published by the Journal of the Royal Society Interface, G. Helles,(2008; 5(21): 387--396, DOI: 10.1098/rsif.2007.1278) ranks I-PAES among the best state-of-the-art folding algorithm.

Notes:
I-PAES code uses some external routines from the TINKER Molecular Modeling Package:
  • analyze
  • protein
  • xyzpdb
It also requires to use the force field parameter set of CHARMM (version 27) energy function (charmm27.prm). These routines are avilable for download directly from the TINKER web-site. ("Readme" file in the I-PAES package contains informations about the installation of these external files in the software and the compilation of the code).

Resources:
  • i-paes.zip - C code of I-PAES including scripts and input files for 1ZDD protein.
  • TINKER Molecular Modeling Package.
Copyright © 2010 Giuseppe Narzisi