I am a passionate, highly motivated, and experienced software engineer with a strong research and development background. Specifically, I focus on the design and implementation of analytics-driven, scalable, fault-tolerant, and energy-efficient resource management systems and algorithms for virtualized data centers. I am also interested in big data management systems performance analysis and evaluation, automated storage provisioning, data placement strategies, and life-cycle management of applications in cloud computing environments. As such I am a member of the Management and Operation of Complex Systems Group in the CTO office Group Function Technology at Ericsson Research Silicon Valley Lab.
Specialties: Distributed systems, Big data analytics, Cloud computing, Virtualization, Energy-efficient resource management, Fault tolerance, High availability, Performance analysis.
Senior Research Engineer @ Working at the intersection of systems and machine learning on the design and implementations of analytics-driven distributed systems. Specifically, my activities include (among others):
(1) Design and implementation of a distributed analytics-driven SLO-aware platform that is capable of orchestrating services across thousands of data centers. At its core the system integrates a machine learning pipeline (data collection, model building, and real-time model consumption) for anomaly detection. So far developed and demonstrated a multi-datacenter service orchestrator service along with a per-host passive monitoring service to expose host, container (cgroups), and service-level metrics. The current prototype implementation heavily relies on existing open-source technology such as HDFS, Tachyon, ZooKeeper, etcd, Docker, Mesos, Marathon, and Spark (Streaming, MLlib). Two patent applications were accepted based on this work.
(2) Design and implementation of a distributed real-time big data analytics platform based on Apache Spark and alternative data processing frameworks (Apache Storm, Hadoop MapReduce): So far developed and demonstrated a "connected car" application using Spark Streaming. Currently implementing the lambda architecture to enable movie recommendations at scale.
(3) Integration of various big data analytics services with the Apcera Hybrid Cloud OS (working jointly with business unit cloud).
(4) Collaboration with UC Berkeley AMPLab and CMU Silicon Valley. So far leading projects involving CMU students (masters and PhD) working on various machine learning projects (cloud resource management, recommender systems). This work has resulted in one conference publication at IEEE CLOUD 2015 and three accepted patent applications. From April 2014 to Present (1 year 7 months) San Jose, CaliforniaPostdoctoral Researcher @ I was a member of the Data Science and Technology Department (see http://dst.lbl.gov) where I have worked in the Integrated Data Frameworks Group on the design and implementation of a software ecosystem to facilitate seamless data analysis across desktops, HPC and cloud environments. Specifically, my work was centered around the following projects:
(1) Intelligent storage and data management in clouds. I have contributed to the design, implementation, and evaluation of the FRIEDA data management system (see http://frieda.lbl.gov): extended FRIEDA to enable application execution on Amazon EC2, developed a Command Line Interface to easily plugin applications into FRIEDA on EC2 and OpenStack clouds, run experiments using scientific applications on EC2. Finally, I have collaborated closely with a physicist to leverage FRIEDA for processing data from the ATLAS experiment at CERN.
(2) Designed and implemented a software to automate meta-data extraction of over 100 AmeriFlux (see http://ameriflux.lbl.gov) tower sites, generate summaries, and inform the sites principal investigators.
(3) File systems performance analysis on Amazon EC2. Focused on LustreFS performance analysis. This work was done in collaboration with the Intel High Performance Data Division.
(4) Emulation of next-generation infrastructures to serve data and compute-intensive applications. Run experiments using Linux containers on the FutureGrid experimentation testbed. This work was done in collaboration with HP Labs. From January 2013 to April 2014 (1 year 4 months) Berkeley, CaliforniaDoctoral Researcher @ I have worked on autonomic and energy-efficient virtual machine (VM) management in large-scale virtualized data centers. More precisely, my contributions were two fold:
(1) Designed and implemented a scalable, autonomic, and energy-efficient VM management system called Snooze. For scalability and autonomy Snooze is based on self-configuring and healing hierarchical architecture. For energy-efficiency Snooze provides a unique holistic energy management solution by integrating VM resource (CPU, memory, network Tx, network Rx) utilization monitoring and estimation, server underload and overload mitigation, VM consolidation, and power management mechanisms. The system has been extensively evaluated at large-scale (thousands of system services) on the Grid'5000 experimentation testbed and shown to be scalable, autonomic, and energy-efficient. It is now available as open-source software under the GPL v2 license at http://snooze.inria.fr. The source code (written in Java) is hosted at GitHub: https://github.com/snoozesoftware/
(2) The second contribution is a novel VM placement algorithm based on the Ant Colony Optimization (ACO) meta-heuristic. ACO is especially attractive for VM placement due to its polynomial worst-case time complexity, close to optimal solutions and ease of parallelization. This work was evaluated by simulation. IBM ILOG CPLEX solver was used to compute the optimal solutions. To enable scalable VM consolidation, this thesis makes two further contributions: (i) an ACO-based VM consolidation algorithm; (ii) a fully decentralized VM consolidation system based on an unstructured peer-to-peer network. The proposed system was evaluated by emulation on the Grid’5000 testbed. Results show the system to be scalable as well as to achieve a data center utilization close to the one obtained by executing a centralized consolidation algorithm.
For more details please see: http://hal.inria.fr/tel-00785090/en From December 2009 to December 2012 (3 years 1 month) Rennes Area, FranceResearch Intern @ Analyzed the performance and energy efficiency of Hadoop deployments with collocated and separated data and compute layers sing scientific data-intensive applications on physical and virtual clusters. The experiments were conducted on 33 power-metered servers of the Grid'5000 experimentation testbed. This work was presented and published at the IEEE BigData 2013 conference (main track). From July 2012 to September 2012 (3 months) Berkeley, CaliforniaMaster Research Intern @ Contributed to the XtreemOS grid operating system (developed by around 130 people). Extended the distributed checkpointing service (written in Java) in three ways:
(1) Designed and implemented independent checkpointing without message logging.
(2) Designed and implemented a user-space checkpoint callback library.
(3) Implemented a JNI translation library to support the latest LinuxSSI/Kerrighed (http://www.kerrighed.org) kernel-level checkpointer version. From March 2009 to August 2009 (6 months) Rennes Area, FranceStudent Research Assistant @ As part of the operating systems group I was involved in the design and implementation of the first XtreemOS distributed checkpointing service prototype. Moreover, I have designed and implemented incremental checkpointing at kernel-level in the Kerrighed Single System Image operating system for clusters. This work was presented at the Ottawa Linux Symposium 2009. From June 2007 to February 2009 (1 year 9 months) Düsseldorf Area, Germany
Doctor of Philosophy (Ph.D.), Computer Science, summa cum laude (with highest honor) @ Université de Rennes I From 2009 to 2012 Master's Degree, Computer Science, excellent (with honor) @ Heinrich-Heine-Universität Düsseldorf From 2008 to 2009 Bachelor's Degree, Computer Science, very good @ Heinrich-Heine-Universität Düsseldorf From 2005 to 2008 Eugen Feller is skilled in: Distributed Systems, Cloud Computing, Software Development, Algorithms, Virtualization, Linux, Java, Fault Tolerance, Programming, Computer Science, Big Data, High Availability, Autonomic Computing, LaTeX, Hadoop, C, Shell Scripting, MapReduce, Python, Scalability, Resource Management, Operating Systems, OOP, Design Patterns, Multithreading, Maven, Subversion, Git, Amazon Web Services..., Open Source Software
Websites:
http://www.eugen-feller.com