Current projects


LS314 Project

The LS314 Project is my startup vehicle. It does not have a specific business proposition but instead it formalises my work on the projects that interest me and that I believe have potential value and that could be converted to business propositions in the future.

The main goal of the LS314 project is to develop a methodology, a set of procedures and rules that will allow me to work efficiently on new ideas and projects. That is, I want to make sure that if I follow this methodology, I will be able to:

  • Avoid wasting time on projects with little value.
  • Plan and execute projects within specified timeframe with concrete outcomes.
  • Increase and document my validated learning.

Some of the projects within the LS314 project are:

  • Communication Server - a standalone, production-ready web application which provides websocket communication to applications that require real-time communication.
  • Surycate Bot - a research project which explores the use of context prompting for LLM-based automattion of terminal-based tasks.
  • Completion App - a simple LLM-based app which focuses on executing repetetive tasks and using past executions to guide and improve future completions.

For more information about the LS314 project, please visit the LS314 website.


Model-based Distributionally Robust Optimization

Distributionally Robust Optimization (DRO) algorithms are a class of optimization methods that aim to solve an optimization problem under uncertainty regarding the underlying probability distribution.

The goal of this project is to extend the established Kullback-Leibler divergence-based DRO to policy optimization problems, specifically contextual bandits, which include a model for the environment, i.e., a parametric function that governs the relation between the context, the agent's actions, and the rewards.

The rationale behind the project is to ensure that the static strategy developed on the offline data is robust to potential changes in the environment. These changes can include variations in the distribution of the contexts as well as changes in the reward function.


Past projects


Pseudo-Collusion Due to Algorithmic Pricing with Transient Demand Response

This project is a side effort within my PhD research. Its goal was to analyze the impact of delayed demand response on the pricing equilibrium reached by autonomous pricing agents.

The project had two key aspects. First, the pricing agents were quasi-bandit algorithms with limited memory, and their exploration was restricted to a neighborhood of the current price. This constraint reflects real-world conditions, where quoted prices cannot fluctuate too drastically. Second, the demand response was modeled as a transient reaction to price changes. Instead of responding immediately, demand adjusted with a delay, depending on the prices from the previous step.

We found that in this setting, autonomous pricing agents can reach a pseudo-collusive equilibrium, where prices exceed the competitive levels predicted by a one-step Nash equilibrium. This occurs because the agents fail to accurately infer the relationship between their actions and rewards.

Additionally, we observed that even without transient demand response, the agents still converged to supra-competitive equilibria. This phenomenon arises from the unique price-reward dynamics in pricing games. When agents, due to exploration, simultaneously quote higher prices, they both receive higher rewards, reinforcing a strong positive incentive for price increases. Conversely, if both quote lower prices, they receive a negative signal. While asymmetric price changes generate signals that encourage competition, this effect alone is sufficient to sustain a dynamic equilibrium with supra-competitive pricing.

This research was presented at the 2024 Conference for Institution and Mechanism Design in Budapest.

A manuscript is available here.


Multi-agent insurance pricing using model-based bandits

This project is a part of my PhD research. The goal was to research the problem of pricing insurance premium using bandit algorithms in multi-agent environments. Within the project, we have developed a pricing environment using a bottom-up approach. We have proved the existence of Nash equilibrium for a single-stage pricing competition. Furthermore, based on the analysis of the payoff functions, we have developed a set of assumptions that guarantee the uniqueness of Nash equilibrium in pure strategies.

Then, we perform numerical experiments with bandit algorithms competing within the environment. For that purpose, we have developed a logistic model of the environment that can be incorporated within Bayesian bandit algorithms.

The numerical experiments showed that while the model can accelerate the learning process when the environment and all opponents are stationary, it does not provide any advantage over the standard bandit algorithm when the opponents are also learning.

Therefore, any model of the environment, as perceived by the learning agent, must faithfully represent reality, i.e., both the pricing environment and the possibly dynamic opponents.

A paper based on this project is available at link.springer.com/article/10.1007/s10614-024-10816-w .