I developed a Retrieval Augmented Generation Model that uses Langchain's PyPDFLoader to take any PDF as input. After breaking down the documents and converting them to vectors, it stores the embedddings in a local vector store. After that, a simple langchain structure with context and prompt engineering summarized the PDF content using gpt-4. Finally, the output is parsed and converted to a powerpoint using the pptx python library.
Alongside another developer, I decided to solve a couple problems in our school's lunch system. As a result, we recorded student feedback about various lunch items and created a website that predicts and optimizes inventory distribution using the data we recorded. I spefically worked on the Python back-end system, including the numerous data visualizations, Google Sheets & Nutrislice APIs, and prediction system.
Link to WebsiteI used Artificial Neural Networks to predict Asteroid hazard level based on diameter, eccentricity, oscillation, and other factors, trained on JPL asteroid data. I compared a simple model with a more complex and optimized model and observed that both models achieved excellent accuracy (99.59% compared with 99.7%). I also analyzed both models' accuracy, precision, recall, and F1-Score, by plotting a Confusion matrix, loss functions, and ROC Curves.
Link to Kaggle JournalAlongside two other devlopers, I created an app that optimizes energy consumption and generation as a solution for California's infamous Duck Curve problem. Our solution featured a ML-powered dashboard that provided data analytics, predictions, and a distribution model based on a simulation of real-world data. I worked on the Python Flask backend, the LightGBM Regression Model, data visualizations using Plotly, and a simple Large Language Model using OpenAI.
Link to Website (may take time to load)As a Research Intern in the Cambridge Center for International Research, I wrote a research paper on the potential of Random Forest Regression Models to predict celestial redshift from near-infrared data taken from the James Webb Space Telescope. As part of my research I used data augmentation, cross-validated search algorithms, and loss curves and residual plots to evaluate my models.
Link to Overleaf Paper