Data Quality Dimensions for AI

Core Contribution

Framework for assessing data quality specifically in AI/ML contexts. Extends traditional data quality dimensions with AI-specific considerations.

Traditional Dimensions Applied

  • Accuracy, completeness, consistency, timeliness (standard DQ)
  • Additional focus on label quality, representativeness

AI-Specific Dimensions

Label Quality: Annotation accuracy, inter-annotator agreement Representativeness: Coverage of target distribution Provenance: Source documentation for audit Freshness: Temporal relevance for deployment context

Quality-Performance Relationship

Examines how data quality dimensions relate to model performance metrics. Not all quality dimensions equally important for all tasks.

Related: 04-atom—data-quality-dimensions-consensus-gap, 04-molecule—data-cascades-concept