AMD Reinforcement Learning Hackathon
← Back to Blogs | Back to Home

  Story

Along with my labmates, I participated in a 2-day Reinforcement Learning workshop organized by AMD and Unsloth, which also included a 24-hour hackathon. We chose the AI Premier League track, where the goal was to build two agents: a question agent that could pose genuinely difficult questions, and an answer agent that could reliably tackle such challenging queries. Each team’s question agent would attack the opposing team’s answer agent, and vice versa, with a final tournament deciding the winner.

Our team of four – Aayush Tyagi, Ashish Goswami, Namasivayam K, and myself – sat down early on to chart a possible path to the solution. We started enthusiastically, powered by a massive AMD MI300 GPU (192 GB) and our beloved Copilot (Claude). We tried several approaches; some worked, some failed, and we kept iterating as a team. Time flew, and before we realized it, the clock hit 2 a.m. We were close to finishing dataset curation but still had to implement the GRPO pipeline for the agents. Then came the dark phase, roughly between 3:30 a.m. and 6:30 a.m., where nothing seemed to work. Around 7 a.m., we finally started seeing positive results. After one short sleep cycle, I jumped back to the laptop to continue.

We then simulated a mini tournament with our own agents playing against each other. After multiple trials, we decided to field our SFT and GRPO-tuned Qwen 2.5 14B models for the actual tournament – fingers crossed, but genuinely confident that our agents could hold their own. At the award ceremony, we waited eagerly for the results. To our disappointment, we did not make it through to the podium. Out of 59 teams, we finished in the top 14, which was respectable, but still short of our expectations. We skipped dinner and went straight back to our hostels. Namas and I found ourselves questioning our existence and expertise a bit – after all, we were a team of four senior PhD students.

The disappointment was real, but my optimistic side could not ignore what we had gained. We had become familiar with the complete LLM fine-tuning cycle: from synthetic data curation and model selection to fine-tuning and evaluation. I deeply appreciate that my teammates were willing to pull an all-nighter and push hard toward a shared goal. This hackathon made me realize the true value of a single day – if used well, 24 hours is enough to learn a lot, build something meaningful, and gather experiences that stay with you.

  Highlights
AMD Hackathon Photo 1
AMD Hackathon Photo 2

Google Scholar | Email | X | LinkedIn