Past Projects
Alpha/AlphaZ Code Generator
The AlphaZ system is an open-source tool for analyzing, transforming, and parallelizing programs within the polyhedral equational model of computing. Programs are specified in a declarative, equational language called "Alpha". Users specify transformations, optimizations, and schedules programmatically, then output efficient C code to perform the desired computations.
Part of my work has been to create a new C code generator in the current version of AlphaZ, as directly porting the generator from the previous version was not possible.
Links:- Current AlphaZ GitHub: https://github.com/CSU-CS-Melange/alpha-language
- Original AlphaZ GitHub: https://github.com/CSU-CS-Melange/AlphaZ
- Original AlphaZ Wiki: https://www.cs.colostate.edu/AlphaZ/wiki/doku.php
Reuse Analysis via Affine Factorization
In January 2024, my paper entitled "Reuse Analysis via Affine Factorization" was published at the IMPACT workshop under the advising of Prof. Sanjay Rajopadhye. This paper details an algorithm to factorize affine functions to expose reuse among sub-expressions. We have incorporated this algorithm into the AlphaZ system and will be using it for the automation of an optimal reduction simplification algorithm.
Links:- IMPACT Paper: https://impact-workshop.org/impact2024/papers/paper3.pdf
- GitHub Repo: https://github.com/ryanjob42/FactorizingAffineMaps
Biology Research Pipelines
A primary aspect of my work at the Data Science Research Institute has been to develop pipelines for analyzing biological data. The first of these pipelines is to distribute RNA sequence mapping using a combination of STAR and Samtools. This pipeline is designed to process hundreds long-read RNA sequencing results, some of which require upwards of 2TB RAM and several hours to compute using our local HPC cluster.
The second pipeline we've developed is for processing mass spectrometry results using msconvert and mzmine. By leveraging our HPC cluster, we aim to significantly improve processing times and enable reproducible research via Singularity containerization.
HPC Cluster Usage Guest Lecture
In the Spring 2024 semester, I taught a guest lecture on the use of HPC clusters for the Parallel Programming/Processing course. As part of this lecture, I developed a demonstration on the use of PyTorch's MPI-like communications library and their distributed model training libraries.
Links:- GitHub Repo: https://github.com/ryanjob42/CSX75-HPC-Demo
Super-EGO Term Project
As a term project in my database management systems course at Colorado State University, I led a small team in an effort to optimize "Super-EGO", an efficient implementation of a similarity join published in 2013. By experimenting with differing compilers and optimization techniques, we were able to get a 1.4x speedup over the existing implementation.
Links:DeepCoder/PCCoder Term Project
For my machine learning term project at Colorado State University, I explored the use of machine learning techniques in program synthesis. The open-source implementation of PCCoder was used as a baseline, which is based on a prior closed-source work called "DeepCoder".
These tools both take several input/output example pairs, then attempt to synthesize a program within their custom domain-specific language (DSL) which correctly reproduce the outputs from the given inputs. These tools train and use an autoencoder network to predict which instructions are most likely to be needed, informing the program synthesis tasks.
In this project, I modified the DSL to remove redundant instructions. For example, the "multiply by 2" instruction was removed, as it could be implemented by adding a value with itself. I hypothesized that removing these instructions would reduce the search space, speeding up the program synthesis task. Ultimately, I was only able to achieve a 0.22% improvement.
Links:Redistricting Project
During summer 2023 at Colorado State University, I assisted Prof. Ewan Davies with leading an undergraduate research project. The students' goal was to study how Markov chain Monte Carlo (MCMC) techniques could be applied to analyzing voting districts and inform redistricting efforts. We aided students with learning the topics, developing code to test their theories, and create a research poster describing their work.
Links:- Prof. Davies' Description: https://www.ewandavies.org/courses/redictricting