HumanELY LLM Evaluation Checklist

To provide a structured way to perform human evaluation, we propose the first and most comprehensive guidance using some commonly used evaluation metrics in a tool form called HumanELY. Our approach and tool helps perform evaluation of LLM outputs in a comprehensive, consistent, measurable and comparable manner. HumanELY comprises of 5 key evaluation metrics of relevance, coverage, coherence, harm and comparison. Additional submetrics within these 5 key metrics provide for likert scale based human evaluation of LLM outputs.