I have strong foundations in computer science as well as biology, and I am capable of both algorithm research and practical implementation. My ambition is to leverage my blend of skills to solve pressing biotechnology challenges that will one day lead to affordable bioenergy and personalized medicine.
Throughout my career I have been exposed to a wide variety of programming languages, and worked on a diverse set of bioinformatic problems. I have long-term interest in biotechnology and I intend to remain involved with its evolution and maturation.
Specialties:
- Proficient in Python, Perl, SQL, R, C, C++, F#, Matlab, Java.
- Expertise in NGS DNA analysis, data warehousing, whole genome shotgun assembly, machine learning, and algorithm design.
- Fluent in English, Mandarin, and Taiwanese; literate in French.
Bioinformatics Developer @ • Investigated the detection of somatic SNVs in NGS tumor/normal paired samples. Developed custom filters that can screen out more than 95% of the false positive hits produced by Strelka and Virmid.
• Contributed python codes to enable statistical data analysis for MultiOmyx™ immune profiling workflows.
• Created pipeline to annotate genetic variants and store the processed results in a normalized schema on MSSQL Server.
• Provided Linux administration for the bioinformatics group server. Automated weekly backup to offsite GE cloud. From April 2013 to Present (2 years 9 months) Scientist, Scientific Computing @ • Automated the generation of control charts for fermentation metrics.
Used TIBCO Spotfire to grab data from database and automatically plot 3 different types of control charts as well as performing basic partition of variation analysis.
• Curated 8 key yeast strains by performing genome transformations in silico, and inspected the alignments of the computed genomes against available sequence data.
• Ported the production fermentation database to data warehouse using star schema.
Designed a star schema that condensed the original database, which contained over 30,000,000 records spread across 71 tables, to a single fact table and 4 dimension tables while retaining the most frequently accessed data. From June 2011 to February 2013 (1 year 9 months) Sr. Scientist, Bioinformatics @ • Contributed to the development of an error correction tool for reads sequenced by the SOLiD platform. The tool does not require a reference genome. On a test E.coli dataset, the tool increases the number of mappable reads by 22%. Number of mapping start points increases by 30%, and the number of error-free reads improves nearly threefold.
• Provided bioinformatic support for SOLiD customers worldwide. Specialized in problems involving sequence alignment, structural variation, de novo assembly, and error correction. Performed custom bioinformatic analyses for GA Tech, UC Davis, OICR, RML, Emory University, and other institutions.
• Tested and provided feedbacks for beta versions of CNV, fragment small indel finder, and error correction tool. Evaluated and recommended the set of default parameters to use with the new mapping and pairing tools developed in 2009. From February 2009 to June 2011 (2 years 5 months) Systems Analyst @ • Selected to become analytics group lead in Dec. 2007. Gathered initial requirements for JGI software pipelines and directed the completion of a paired-end 454 QC tool and a metagenome library QC tool.
• Led a team of software engineers to develop Juniper, a new short read whole genome shotgun assembler.
• Automated the ab initio clustering of EST sequences, and provided tool to calculate the relative contribution of each library used in the clustering.
• Assembled, analyzed, and released a number of high profile genomes, including Xenopus tropicalis (frog), Physcomitrella patens (moss), and Helobdella robusta (leech).
• Stabilized in-house JAZZ assembler by resolving more than 800 compiler warnings and over 30,000 memory errors. From July 2005 to January 2009 (3 years 7 months) Bioinformatics Intern @ • Inferred the human protein interactome based on evidences from Medline, microarray data, protein interaction databases, and sequence orthology.
• Constructed an interactive visualization of the interactome using Perl and GraphViz. From June 2004 to June 2005 (1 year 1 month) Research Assistant @ • Collaborated with Prof. Serafim Batzoglou to devise an algorithm that can perform local alignment between two sets of multiple alignments.
• Implemented the alignment algorithm using C++.
• Analyzed the accuracy and speed of our approach against NCBI blast, ClustalW, and PatternHunter. From June 2003 to August 2003 (3 months) Research Assistant @ • Collaborated with Prof. Avi Pfeffer to find novel sampling-based probabilistic inference algorithms.
• Implemented a new sampling algorithm in Matlab using Kevin Murphy’s Bayes Net Toolbox.
• Submitted technical paper “Improving the Performance of Sampling in Bayesian Networks with Multisampling” to UAI 2003 (Uncertainty in AI) conference. From September 2002 to May 2003 (9 months)
M.S., Bioinformatics @ Boston University From 2004 to 2005 B.S., Computer Science and Mathematics @ University of Toronto From 1999 to 2002 B.A., Molecular Biochemistry and Biophysics @ Yale University From 1995 to 1999 Hank Tu is skilled in: DNA sequencing, Bioinformatics, Perl, Machine Learning, Algorithms, Matlab, Biotechnology, C++, Java, Databases, Genomics, SQL, R, R&D, Molecular Biology
Websites:
http://www.amyris.com