Selected work

My Portfolio

A mix of academic, professional and side projects across embedded intelligence, TinyML, semiconductor reliability and the web.

In progressMaster's thesis research at the University of Siegen — exploring how open-source RISC-V style accelerators can run quantized neural networks at the edge.

Hardware-Accelerated DL with VTA

Optimizing and deploying Deep Learning models onto FPGA-based hardware accelerators using the Versatile Tensor Accelerator (VTA) framework for real-time applications.

FPGA
Deep Learning
VTA
Hardware

CompletedGroup project at the University of Siegen with Saju Khakurel and Thong Phan, supervised by Prof. Dr. Kristof Van Laerhoven (July 2025).

Efficient Human Activity Recognition on Memory-Constrained Smartwatches

TinyML study deploying CNN, LSTM, Hybrid CNN-LSTM and DeepConvLSTM models on the Bangle.js 2 smartwatch (256 KB RAM, 1 MB Flash) for real-time activity classification across the HANG Time-HAR, PAMAP2 and WEAR datasets.

Best accuracy (HANG Time-HAR): 79.3%
Datasets evaluated: 3
Architectures benchmarked: 4+
Target device RAM: 256 KB

Benchmarked four lightweight architectures across three HAR datasets under strict memory and latency budgets.
DeepConvLSTM reached 79.3% test accuracy on the 19-class HANG Time-HAR basketball dataset.
SeparableConv1D CNN delivered the best efficiency on WEAR — strongest fit for actual on-device deployment.
Full TFLite Micro pipeline: preprocessing, MinMax scaling, post-training int8 quantization and emulator validation.

Problem & motivation

Smartwatches are everywhere, but running deep learning HAR models on them is hard: 256 KB of RAM and 1 MB of Flash on the Bangle.js 2 rule out standard DL stacks. TinyML closes this gap by keeping inference local, private and low-latency.

Case study 1 — HANG Time-HAR (basketball)

Classified 19 basketball activities from wrist-worn 3-axis accelerometer data (50 Hz, ±8 g) recorded by 24 players. Trained CNN (4,124 params), LSTM, Hybrid CNN-LSTM and DeepConvLSTM (41,324 params) with a 60/20/20 split and MinMax-scaled (20×3) windows. DeepConvLSTM was the strongest at 79.33% test accuracy; the LSTM collapsed at ~9.6%, confirming convolutional feature extraction is essential on this dataset.

Case study 2 — PAMAP2

Re-evaluated the same family of architectures on the PAMAP2 daily-activity benchmark to test generalization across sensor placements and activity types.

Case study 3 — WEAR

Adapted a Bangle.js gesture-recognition pipeline to the WEAR fitness dataset and compared a Baseline CNN, SeparableConv1D CNN, Multi-branch CNN and Multi-scale CNN in both FP32 and INT8. SeparableConv1D delivered the best accuracy-per-byte trade-off and was selected as the deployment candidate.

Deployment & optimization

Trained in Keras, converted via TensorFlow Lite for Microcontrollers, then post-training quantized to int8. Models were validated in the Bangle.js emulator (TFLite → JSON) since the physical device was unavailable; processing speeds stayed in the 7–27 ms range per inference.

TinyML
CNN
LSTM
DeepConvLSTM
TensorFlow Lite Micro
Quantization
Bangle.js 2
Edge AI

CompletedLiterature review for Semiconductor Electronics Design, EMINENT M.Sc. programme — January 2026.

Error Correction Codes Below 28 nm — Literature Review

Semiconductor Electronics Design literature review on adaptive Error Correction Codes (ECC) for sub-28 nm memories, covering soft errors, multi-cell upsets and automated management of ECC features across chip revisions.

DFMC efficacy: 99.956%
Energy saving vs static LPC: ~41.35%
Feature propagation precision: ~99%
References: 4 IEEE/ACM sources

Surveyed why sub-28 nm FinFET/GAAFET nodes are increasingly vulnerable to Single Event Upsets and Multi-Cell Upsets.
Compared Parity, Hamming (SEC-DED), Line Product Code (LPC) and Chipkill across reliability, area and energy.
Analyzed the Dynamic Fault-Tolerant Memory Controller (DFMC) — adaptive ECC at 28 nm CMOS, 1 V.
Studied automated propagation of ECC feature revisions in preprocessor-based Software Product Lines.

The scaling challenge

As critical charge (Q_crit) shrinks at 16 nm, 7 nm and below, even low-energy neutrons and alpha particles cause bit-flips. Manufacturing variation, NBTI and HCI aging further erode reliability, making ECC a first-class architectural concern rather than an afterthought.

Adaptive ECC: the DFMC architecture

Stefani et al. (SBCCI 2023) propose a Dynamic Fault-Tolerant Memory Controller that switches between Parity, Hamming and LPC per memory block based on an evaluator + threshold engine. In 28 nm CMOS at 1 V it reaches 99.956% correction efficacy while using ~41.35% less energy than a static LPC controller in low-error scenarios — at the cost of growing the controller footprint from 1,446 µm² (no ECC) to 8,205 µm².

Managing ECC features across chip revisions

Michelon et al. (SANER 2023) treat ECC variants as features in a Software Product Line gated by #ifdefs. Their tool mines feature commits, computes deltas between source and destination releases and propagates ECC implementations automatically — ~99% precision/recall and ~63 s per propagation, avoiding error-prone manual integration of reliability logic.

Outlook

The review argues for hardware-software synergy at sub-28 nm: ML-driven error prediction inside the memory controller, version-control-integrated feature propagation, and energy-aware codes like LPC replacing heavyweight Chipkill in high-density 3D memories.

Semiconductors
ECC
Reliability
FinFET
Memory Controllers
Research