I'm a Staff ML Engineer in Natural Language Processing and Machine Learning at Emergence focusing on developing robust evaluation metrics for large language models. I previously did research at Taiger Singapore on knowledge graph generation and deep active learning systems. My work focuses on understanding and improving how we evaluate AI models, particularly for tasks like text summarization and knowledge extraction. I'm especially interested in prompt-based evaluation techniques versus traditional metrics, and how we can make language models more reliable and interpretable. In recent work, I've been exploring ways to evaluate LLM-based metrics systematically and compare different evaluation paradigms like prompt-based and likelihood-based approaches.
I have broad research interests spanning machine learning, natural language processing, and knowledge representation. Some key areas I work on include knowledge graph generation, active learning, evaluation metrics for generative AI, and making language models more robust and trustworthy. I aim to develop AI systems that can reliably understand and reason about language while being transparent about their capabilities and limitations.