I have over fifteen years of working experience in software development, with focus on big data and real-time analytics in recent years. Currently I work as a software engineer at LinkedIn’s data infrastructure team, improving products and services for our customers by using big-data tools on Hadoop, building real-time analytics systems, creating data pipelines for data warehouses, and developing data mining and machine learning platforms. Previously I was a software engineer at Microsoft, where I was a member of the Bing Ads (adCenter) founding engineering team and helped build up the adCenter ETL team from scratch. Later I also worked on developing Bing telemetry systems and Bing Satori knowledge graphs.
Staff Software Engineer @ • Implemented various ETL data pipelines using Hadoop, PIG, Spark, Azkaban, AppWorx, MySQL, and Teradata.
• Migrated the Voice of Member and Customer system from weekly-batch processing to real-time processing based on ElasticSearch.
• Responsible for the development of a new data warehouse system for LinkedIn Help Center.
• Worked with Business Analytics team on Data Mart for Data Mining (DM2) related features.
• Investigated various approaches for performance improvements on Hadoop. From January 2015 to Present (1 year) Senior Software Engineer @ • Responsible for the design and implementation of self-serve infrastructure of the Bing knowledge data reporting platform.
• Developed many critical features in the monitoring systems for Bing’s service availability and latency.
• Built end-to-end data processing ETL pipelines and workflows based on the Data Warehouse Controller platform to achieve strict reporting SLAs.
• Worked with data analysts to improve the performance of Bing search engine, and trouble-shoot many live site issues.
• Optimized performance of Cosmos virtual clusters and data processing pipelines with many different approaches. From December 2011 to January 2015 (3 years 2 months) Senior Development Lead @ • Implemented various BI applications to generate reports for both advertisers and publishers, based on massive amount of raw web data.
• Designed automatic testing/validation applications with C#/.NET for data processing systems.
• Managed the migration of legacy ETL applications to map/reduced based distributed systems, to process terabytes of data daily.
• Worked with database teams to implement efficient and robust data distribution/extraction/loading interfaces.
• Developed a comprehensive BI reporting/OLAP system for Mobile Advertising with SQL Server Business Intelligence Suite (SSIS/SSAS/SSRS). From May 2005 to December 2011 (6 years 8 months) Software Engineer @ • Developed graphic user interface (GUI) for Windows client applications using Visual C++/MFC.
• Maintained and enhanced a third-party MFC software package, Ultimate Grid, which was used extensively in data entry GUI applications.
• Worked on the IWR (Interactive Web Responder) website for health insurance participants’ online information, based on J2EE, Apache, and BEA WebLogic/Portal.
• Created FTP file transferring monitor and Java FTP Client for automatic electronic data processing.
• Developed DLLs to allow Java applications on Windows machines to invoke the applications in the Tuxedo domain on UNIX/VMS.
• Implemented middle tier service routines to connect MFC client applications to the Oracle database server. From June 2001 to May 2005 (4 years)
Master of Science (M.S.), Computer Science @ University of South Carolina-Columbia From 1995 to 2000 Doctor of Philosophy (Ph.D.), Engineering @ University of South Carolina-Columbia From 1995 to 2000 Bin (Henry) is skilled in: Distributed Systems, Microsoft SQL Server, Databases, C#, Software Development, Software Engineering, Agile Methodologies, Data Warehousing, Business Intelligence, Software Design, .NET, Java, Hadoop, Performance Tuning, Scalability