Selected work

My Portfolio

A mix of academic, professional and side projects across embedded intelligence, TinyML, semiconductor reliability and the web.

In progressMaster's thesis research at the University of Siegen — exploring how open-source RISC-V style accelerators can run quantized neural networks at the edge.

Hardware-Accelerated DL with VTA

Optimizing and deploying Deep Learning models onto FPGA-based hardware accelerators using the Versatile Tensor Accelerator (VTA) framework for real-time applications.

  • FPGA
  • Deep Learning
  • VTA
  • Hardware
CompletedGroup project at the University of Siegen with Saju Khakurel and Thong Phan, supervised by Prof. Dr. Kristof Van Laerhoven (July 2025).

Efficient Human Activity Recognition on Memory-Constrained Smartwatches

TinyML study deploying CNN, LSTM, Hybrid CNN-LSTM and DeepConvLSTM models on the Bangle.js 2 smartwatch (256 KB RAM, 1 MB Flash) for real-time activity classification across the HANG Time-HAR, PAMAP2 and WEAR datasets.

Best accuracy (HANG Time-HAR)
79.3%
Datasets evaluated
3
Architectures benchmarked
4+
Target device RAM
256 KB
  • Benchmarked four lightweight architectures across three HAR datasets under strict memory and latency budgets.
  • DeepConvLSTM reached 79.3% test accuracy on the 19-class HANG Time-HAR basketball dataset.
  • SeparableConv1D CNN delivered the best efficiency on WEAR — strongest fit for actual on-device deployment.
  • Full TFLite Micro pipeline: preprocessing, MinMax scaling, post-training int8 quantization and emulator validation.

Problem & motivation

Smartwatches are everywhere, but running deep learning HAR models on them is hard: 256 KB of RAM and 1 MB of Flash on the Bangle.js 2 rule out standard DL stacks. TinyML closes this gap by keeping inference local, private and low-latency.

Case study 1 — HANG Time-HAR (basketball)

Classified 19 basketball activities from wrist-worn 3-axis accelerometer data (50 Hz, ±8 g) recorded by 24 players. Trained CNN (4,124 params), LSTM, Hybrid CNN-LSTM and DeepConvLSTM (41,324 params) with a 60/20/20 split and MinMax-scaled (20×3) windows. DeepConvLSTM was the strongest at 79.33% test accuracy; the LSTM collapsed at ~9.6%, confirming convolutional feature extraction is essential on this dataset.

Case study 2 — PAMAP2

Re-evaluated the same family of architectures on the PAMAP2 daily-activity benchmark to test generalization across sensor placements and activity types.

Case study 3 — WEAR

Adapted a Bangle.js gesture-recognition pipeline to the WEAR fitness dataset and compared a Baseline CNN, SeparableConv1D CNN, Multi-branch CNN and Multi-scale CNN in both FP32 and INT8. SeparableConv1D delivered the best accuracy-per-byte trade-off and was selected as the deployment candidate.

Deployment & optimization

Trained in Keras, converted via TensorFlow Lite for Microcontrollers, then post-training quantized to int8. Models were validated in the Bangle.js emulator (TFLite → JSON) since the physical device was unavailable; processing speeds stayed in the 7–27 ms range per inference.

  • TinyML
  • CNN
  • LSTM
  • DeepConvLSTM
  • TensorFlow Lite Micro
  • Quantization
  • Bangle.js 2
  • Edge AI
CompletedLiterature review for Semiconductor Electronics Design, EMINENT M.Sc. programme — January 2026.

Error Correction Codes Below 28 nm — Literature Review

Semiconductor Electronics Design literature review on adaptive Error Correction Codes (ECC) for sub-28 nm memories, covering soft errors, multi-cell upsets and automated management of ECC features across chip revisions.

DFMC efficacy
99.956%
Energy saving vs static LPC
~41.35%
Feature propagation precision
~99%
References
4 IEEE/ACM sources
  • Surveyed why sub-28 nm FinFET/GAAFET nodes are increasingly vulnerable to Single Event Upsets and Multi-Cell Upsets.
  • Compared Parity, Hamming (SEC-DED), Line Product Code (LPC) and Chipkill across reliability, area and energy.
  • Analyzed the Dynamic Fault-Tolerant Memory Controller (DFMC) — adaptive ECC at 28 nm CMOS, 1 V.
  • Studied automated propagation of ECC feature revisions in preprocessor-based Software Product Lines.

The scaling challenge

As critical charge (Q_crit) shrinks at 16 nm, 7 nm and below, even low-energy neutrons and alpha particles cause bit-flips. Manufacturing variation, NBTI and HCI aging further erode reliability, making ECC a first-class architectural concern rather than an afterthought.

Adaptive ECC: the DFMC architecture

Stefani et al. (SBCCI 2023) propose a Dynamic Fault-Tolerant Memory Controller that switches between Parity, Hamming and LPC per memory block based on an evaluator + threshold engine. In 28 nm CMOS at 1 V it reaches 99.956% correction efficacy while using ~41.35% less energy than a static LPC controller in low-error scenarios — at the cost of growing the controller footprint from 1,446 µm² (no ECC) to 8,205 µm².

Managing ECC features across chip revisions

Michelon et al. (SANER 2023) treat ECC variants as features in a Software Product Line gated by #ifdefs. Their tool mines feature commits, computes deltas between source and destination releases and propagates ECC implementations automatically — ~99% precision/recall and ~63 s per propagation, avoiding error-prone manual integration of reliability logic.

Outlook

The review argues for hardware-software synergy at sub-28 nm: ML-driven error prediction inside the memory controller, version-control-integrated feature propagation, and energy-aware codes like LPC replacing heavyweight Chipkill in high-density 3D memories.

  • Semiconductors
  • ECC
  • Reliability
  • FinFET
  • Memory Controllers
  • Research