I am an experienced RTL engineer with proven capability in the design and implementation of non-trivial micro-architectures.
Key skills:
• I have a proven knowledge in the design and implementation of non-trivial micro-architectures, including non-trivial pipelines (stalls, replays, interleaving, forwarding); out-of-order execution; speculation; caching and virtualization (PCIe, SR-IOV).
• I have knowledge of C/C++/Perl/Python and assembly.
• I have the ability to create and design highly configurable RTL designs. You may find an Python-based Verilog pre-processor ('SH4') at https://github.com/stephenry/sh4 including a number of templates and testbenches.
Principle Engineer @ Micro-architect on a Natural Language Processing (NLP) hardware accelerator. From March 2015 to Present (10 months) Senior Hardware Engineer @ At Broadcom, I was responsible of the architecture and implementation of two major subsystems:
• Ingress Pipeline Descriptor Former (DF) (Architecture and Implementation):
Subsystem responsible for the allocation of incoming packets in memory, DMA command generation and interrupt generation upon completion. Management and replenishment of 'free-pool' of non-contiguous of buffers in memory; support for RDMA; support for packet coalescing 'large receive offload' (4-way LRO opposed to 1-way LRO found in all other competing implementations).
• Networking Direct Memory Access Controller (DMAC) (Architecture and Implementation):
DMA Controller for the Networking Complex (8-10 RTL engineers) responsible for communicating networking traffic to/from main memory. Notable features: Non-trivial credit-based read scheduling mechanism which used speculation to preemptively schedule DMA commands before their ability to execute was known. An out-of-order, non-blocking TLB to translate guest DMA commands and to perform SR-IOV operations (ATS, PRI, INV). An out-of-order bus interface and re-order buffer. Highly parameterized using a Perl-based preprocessor (in addition, some RTL was generated procedurally in pure Perl) to allow three varying configurations. From June 2013 to March 2015 (1 year 10 months) Senior Engineer @ • ARC HS Microprocessor
Worked on the execution unit (integer data path) of the ARC HS microprocessor.
• ARC EM Microprocessor
High-parameterized RTL implementation of the ARC2.1 ISA Interrupt/Exception Architecture in the EM microprocessor. Notable aspects of the design: Highly parameterized (!) implementation which necessitated sophisticated use of a pre-processor; CISC-like instructions, which were non-trivial to implement due to the need to preserve atomic updates to the machine state even under exceptional conditions; introduction of multiple register files; considerable reworking of exception and reset mechanism; considerable additions to machines state. From March 2011 to June 2013 (2 years 4 months) Senior Hardware Design Engineer @ • Hardware architect for 'IoDyssey' (Imaging on Demand) platform.
An FPGA-based, video processing system used for land-based defense vehicles. System contained logic to capture video, perform loss less JPEG2000 compression and communicate the compressed bitstream across an attached military radio system. From June 2007 to June 2010 (3 years 1 month) EngD candidate @ Industry-based doctoral candidate working on video compression algorithms on novel re-configurable architectures. From 2003 to 2007 (4 years)
Master of Science (MSc), Computer Science @ The University of Edinburgh From 2010 to 2011 Bachelor's Degree, Electrical and Electronics Engineering, 1st Class @ University of Strathclyde From 1999 to 2003 Doctor of Philosophy (Ph.D.), Computer Architecture, N/A - did not complete @ Institute for System Level Integration From 2003 to 2007 Stephen Henry is skilled in: Verilog, ASIC, RTL Design, SoC, Debugging, Microprocessors, Perl, FPGA, Embedded Systems, Hardware Architecture, Emacs, C, SystemVerilog, Computer Architecture, Semiconductors