Projects

Here a sample of projects I've been working during my studies. If you're interested in more insights or you have cool ideas for extensions/ future work, don't hesitate to drop me a line.

Machine Learning and Statistics

Graphical Modeling to study Conditional Dependencies in High Dimensional Data [Code: , Paper: ]

Understanding recovery patters in stroke patients is crucial to improve clinical rehabilitation. Here we apply graphical models to study high dimensional time series, and show that our approach is effective in the study of stroke data.
Analysis of New York Times statistics for COVID-19 in the US [Code: ]

Data visualization is an incredibly powerful tool that can affect health policy decisions. Ensuring they are easy to interpret, and more importantly, showcase accurate insights from data is paramount for scientific transparency and the health of individuals. As part of this project for the BS270 class at Harvard, we critically analysed COVID-19 visualizations and tables published by the New York Times.
Bivariate Causal Discovery via Conditional Divergence [Code: , Report: ]

Implementation of the CDCI algorithm from the Bivariate Causal Discovery via Conditional Divergence paper. The algorithm is a simple, yet effective, method to identify causal direction between two variables.The main idea relies on the assumption that the conditional distribution of the effect given the cause has the same shape for any value of the cause (despite locations and scales can differ). On the other hand, conditioning the cause on the effect does not usually show this property.
3D Hand Pose Estimation [Code: , Report: ]

Hands are one of the main ways that we use to interact with the world and, potentially, they could act as interface between the physical and the virtual world. We propose an efficient-net encoder and a novel decoder based on separable convolutions to produce 3D hand-keypoints from a monocular image. Our architecture yields good performance with reasonable computing power as the number of parameters to be learned is much smaller than in other methods.
Analysis of the impact of diversity preserving mechanisms on Evolutionary Algorithms [Manuscript: ]

Empirical evidence suggests that diversity preserving mechanisms tend to improve the performance of evolutionary algorithms. Here we present a rigorous mathematical proof that this is not always the case: there are monotone functions where the duplicate avoidance mechanism changes the behaviour of the classical (2+1)-EA, such that its runtime shifts from quasi-linear to exponential.
ML talks [Code: ]

Slides from various presentations/ talks I gave over the years.

Gene Regulatory Networks

The network Zoo [Code (Python): , Code (R): , Website: , Paper: ]

The Network Zoo is a collection of open-source methods to infer GRNs, conduct differential network analyses, estimate community structure, and explore the transitions between biological states.
Co-expression Batch Reduction Adjustment (COBRA) [Code: , Paper: , BiorXiv: , ]

In this work, we demonstrate the persistence of confounders in covariance after standard batch correction using synthetic and real-world gene expression data examples. Subsequently, we introduce Co-expression Batch Reduction Adjustment (COBRA), a method for computing a batch-corrected gene co-expression matrix based on estimating a conditional covariance matrix. COBRA is computationally efficient, leveraging the inherently modular structure of genomic data to estimate accurate gene regulatory associations and facilitate functional analysis for high-dimensional genomic data.
GIRAFFE: a novel algorithm for Gene Regulatory Network inference [Code: , Manuscript: , Blog series: ]

Accurately estimating regulation is crucial to inform our understanding of diseases such as cancer. We develop GIRAFFE, a scalable matrix factorization-based algorithm that integrates prior knowledge to guide the optimization of a non-convex objective function. We demonstrate the effectiveness of this approach with extensive experiments on synthetic, as well as real world data. In particular, our algorithm outperforms state-of-the-art gene regulatory network inference methods in terms of accuracy, interpretability, scalability, and/ or flexibility.
NetworkDataCompanion [Code: , Workflow: , Paper: ]

We present the NetworkDataCompanion, an R package that streamlines various steps in TCGA data processing, including filtering and mapping gene and sample identifiers between modalities (which is often a challenge with such heterogeneous data) and allows modality-specific data transformation, such as normalization and cleaning. NDC enables all the pre-processing steps in tcga-data-nf, but it is also available as a standalone tool for separate use.
Netbooks [Code: , Paper: ]

A collection of Jupyter notebooks that provide detailed and annotated step-by-step case studies of the inference and analysis of gene regulatory networks.
GRN thresholding [Code: ]

Implementation of a collection of methods to threshold weighted (un)directed graphs, such that only the edges with largest (absolute) weight are retained.

Algorithms

Algorithms Lab [Code: ]

My solutions to the competitive programming challenges of the ETH course AlgoLab in 2020/2021. The solutions were written in a limited amount of time to simulate the exam's conditions. Therefore, the coding style is suboptimal and may not always be consistent.
Scratch implementation of Algorithms and Data Structures in Java [Code: ]

Java implementation of Algorithms and Data Structures I learned in the first year of my undergratuate studies. For an optimal learning experience, I mostly avoided relying on external libraries and implemented everything from scratch.

Blog

Summaries [Repository: ]

Machine Learning

Covers various basic topics in Machine Learning: bias-variance trade-off, regularization, techniques for model selection, SVMs, neural networks (backpropagation, CNNs, GANs, ...), dimensionality reduction, algorithms for clustering, EM algorithm, ...
Parallel Programming

A journey through very diverse topics in parallel programming: from theoretical concepts (Amdahl's law, sequential consistency, consensus) to MPI, passing by race conditions, tasks graphs, and different ways to ensure mutual exclusion (locks, atomic operations, transactional memory).
Algorithms and Data Structures

Asymptotic notation, divide and conquer, sorting and searching, graph algorithms, (un)balanced search trees... A great toolkit to solve algorithmic problems!
Reproducible Data Science

Useful topics applicable to every Data Science project: reproducibility vs replicability, case studies in reproducible research, data provenance, statistical methods for reproducible science, and useful computational tools.
Systems Programming and Computer Architecture

The core concepts for every Computer Scientist: bit hacks, binary representation, memory allocation, caches, assembly, buffer overflow, computer architectures, and more!
Algorithms and Probability

One of my favorite courses ever: introduction to core concepts in discrete probability (random variables, their moments, useful bounds) and their application on randomized algorithms. Examples of application fields for Monte Carlo/ Las Vegas algorithms include graph theory (min cuts, bootstrapping, longest paths) and geometry (smallest enclosing circle, Jarvis March, convex hull).
Financial Economics

A mathematical approach to option contracts, arbitrage, market equilibria, and risk management models.
Cryptography

Basic concepts for public/ private key encryption/ decryption and authentication: hash functions, Diffie-Hellmann protocol, RSA, zero knowledge proofs and commitment schemes.
Visual Computing

A primer on non-ML approaches to Computer Vision: image segmentation, convolutions, image vs frequency domain, optical flow, and physically based animations.
Fundamentals of Economics

My notes from the book Economics 101 by Alfred Mil, that I studied for personal interest as a prior to possibly approach the world of investments.

Posts

Master's Thesis at Harvard, my personal experience

A collection of practical insights based on my personal experience for people interested in visiting Boston in the context of a (Master's) thesis.
Multiple Testing, an overview of different decision rules: from Bonferroni to Benjamini-Hochberg

Single hypothesis testing is a famous topic in classical statistics. Here we discuss multiple ways to extend the basic framework to handle multiple hypotheses simultaneously.
GLUE, an application of (Graph) Variational Autoencoders in Computational Biomedicine

High-throughput technologies enabled to collect huge amounts of data from different “omics”. However, to unveil most information about the underlying functionalities of different cells, computational integration is essential. Here we show how GLUE maps multi-omics data from different cells to a common embedding space satisfying highly desirable properties.
Dimensionality Reduction “behind the scenes”: how do the most popular algorithms work?

Low dimensional embeddings are an essential tool in Data Science, with applications ranging from visualization to dataset pre-processing in high-dimensional contexts. Here we summarize four of the most popular dimensionality reduction algorithms: PCA, ICA, t-SNE, and UMAP.
SHOPPER: a Probabilistic Model of Consumer Choice

A success story of collaboration between different scientific communities: the power of Machine Learning at the service of econometrics. SHOPPER uses variational inference to optimize latent parameters, and this gives precious insights about both products and customers.

Projects

Machine Learning and Statistics

Graphical Modeling to study Conditional Dependencies in High Dimensional Data [Code: , Paper: ]

Analysis of New York Times statistics for COVID-19 in the US [Code: ]

Bivariate Causal Discovery via Conditional Divergence [Code: , Report: ]

3D Hand Pose Estimation [Code: , Report: ]

Analysis of the impact of diversity preserving mechanisms on Evolutionary Algorithms [Manuscript: ]

ML talks [Code: ]

Gene Regulatory Networks

The network Zoo [Code (Python): , Code (R): , Website: , Paper: ]

Co-expression Batch Reduction Adjustment (COBRA) [Code: , Paper: , BiorXiv: , ]

GIRAFFE: a novel algorithm for Gene Regulatory Network inference [Code: , Manuscript: , Blog series: ]

NetworkDataCompanion [Code: , Workflow: , Paper: ]

Netbooks [Code: , Paper: ]

GRN thresholding [Code: ]

Algorithms

Algorithms Lab [Code: ]

Scratch implementation of Algorithms and Data Structures in Java [Code: ]

Blog

Summaries [Repository: ]

Machine Learning

Parallel Programming

Algorithms and Data Structures

Reproducible Data Science

Systems Programming and Computer Architecture

Algorithms and Probability

Financial Economics

Cryptography

Visual Computing

Fundamentals of Economics

Posts

Master's Thesis at Harvard, my personal experience

Multiple Testing, an overview of different decision rules: from Bonferroni to Benjamini-Hochberg

GLUE, an application of (Graph) Variational Autoencoders in Computational Biomedicine

Dimensionality Reduction “behind the scenes”: how do the most popular algorithms work?

SHOPPER: a Probabilistic Model of Consumer Choice