Assessing Crowd Behaviour w/ LLM - BBC Africa Eye - Kenya Anti-tax Protesters

May 05, 2025

What insights might an LLM offer if tasked with objectively observing the events presented in the recent BBC video?

Figure 1: General attitudes in the video

IRFundusSet - Large heterogeneous retinal fundus dataset

March 16, 2024

TLDR;

Access a larger and more heterogeneous retinal fundus dataset
Integrates and harmonizes pixel-level and label data
Current coverage is 10 public datasets
IRFundusSet paper
IRFundusSet on Github

Table: Composition of IRFundusSet

Public Datasets - Retinal Fundus Color Photographs

February 24, 2023

TLDR;

Retinal fundus color photographs capture information that’s relevant for ocular and systemic diseases.
Several public datasets avail these images.
Not a bad place to start when learning computer vision tasks for health data.
Some research opportunities

What is Hypergraph Learning?

December 09, 2022

TLDR;

Real-world scenarios entail complex relationships between multiple entities.
Hypergraphs capture more than pairwise relationships, encoding complex relationships.
You can construrct a hypergraph for non-graphical data; define your nodes and hyperedges using various similarity-based approaches.

Figure: Constructing your own hypergraph

Multicollinearity and VIF

March 16, 2022

The general setup for your regression problem may look something like below. The model coefficients ($ \beta_i $) may then be interpreted in a manner that indicates the amount of change in your predictor variables ($X$) that results in a unit change in your dependent variable $y$. A problem arises when there are significant correlations between your predictor variables so that a change in one such variable not only causes a change in $y$ but in the other correlated predictor variables as well, thus misestimating the model coefficients and making their interpretation difficult. This is the problem of multicollinearity.

\[y = \sum_{i=1}^{p} \beta_i X_{i}^{n} + \epsilon\]

Implications of multicollinearity: Multicollinearity may not affect model accuracy much and is mainly a concern when interpreting model coefficients. If you need to speak to the importance of a feature in a model that assumes linear regression, then multicollinearity is something to watch out for. Some of the models affected include linear regression and SVM models using a linear kernel.

Variance Inflation Factor (VIF) is one way to quantify multicollinearity. It measures how much the variance (and thus the standard errors) of the model coefficients are inflated. The VIF of the ith coefficient is computed as below, where $R^2_i$ is $R^2$ of the model obtained by regressing the ith predictor variable on the other predictor variables.

\[VIF_{i} = \frac{1}{1 - R^2_i}\]

How to use it

An ideal value for VIF could be 1, indicating no inflation of standard errors ($SE$) and, therefore, no multicollinearity for that predictor variable. Another way to think about it is that VIF is a multiplier factor on the variance, therefore $\sqrt{VIF}$ is the multiplier factor on the standard errors of the model coefficients. So, when $VIF = 1$ $ \implies 1 * SE $, thus no inflation. If $VIF=4$, then, $\sqrt{4} = 2$ factor inflation, meaning that the SE of that model coefficients is two times larger than if there were no multicollinearity with other predictor variables.

Matplotlib styles

October 14, 2021

This entry assumes that you’re already familiar with python, matplotlib and seaborn and are looking to be more productive when using these tools for your research work. If you’re looking for introductory coding material, there are a few links at the end of the article to get you started. All the same, this entry should still be able to frame things and you can go into the specific coding tutorials.

The idea here is to set up a reusable theme/style and find suitable settings for publication-quality plots. That way, you have consistent styling in your plots and, of course, by scripting your process, it is easier to update your report as your experiments or your output media change.

Retrieval-based Chatbot

April 24, 2020

Chatbots automatically provide answers to common or well-known issues in a manner that simulates conversational interactions. In this project, we build a retrieval-based chatbot using cosine similarity on a database of frequently asked questions about COVID-19 as at 31-Mar-2020.