This report documents the systematic quality control (QC) process applied to the High-Fidelity Carbide Dataset (HF-CCD), derived from the Materials Project database. Through a rigorous five-stage pipeline, we curated a high-quality dataset suitable for machine learning applications in materials informatics.
- Systematic removal of 1,110 problematic entries (19.1%)
- Elimination of 1,098 duplicate structures
- Validation of crystallographic consistency across all entries
- Complete descriptor coverage (0% missing values)
- Statistical outlier detection and removal