Data Quality Cited as Top Reason for AI Project Failure

According to Shiva Pillay, a senior executive at Veeam, 80-90% of AI projects fail due to fundamental data issues. In a recent podcast, Pillay stated that enterprises are trying to use fragmented, poorly governed, and inconsistently labeled data that was not designed for AI models. He emphasized that technical buyers now demand end-to-end traceability and provenance for the data they use.

- In Reinforcement Learning from Human Feedback (RLHF), a key challenge is the subjectivity and inconsistency of human annotators, which can lead to confusing training signals for the model. To mitigate this, some labs are implementing rigorous training for annotators on domain-specific nuances and ethical guidelines to improve inter-rater reliability. - The debate between using synthetic versus human-labeled data hinges on a trade-off between scale and nuance; while synthetic data can be generated up to 50 times faster, it can be up to 35% less accurate for tasks requiring contextual understanding. Many find a hybrid approach optimal, using synthetic data for broad coverage and human labeling to refine performance on critical edge cases. - Constitutional AI, a technique developed by Anthropic, aims to reduce the reliance on large-scale human feedback by providing the AI with a set of principles or a "constitution" to guide its responses. However, human oversight is still critical in defining these initial rules and ensuring they align with human values. - The evaluation of agentic AI systems, which make decisions and take actions, requires new benchmarks beyond traditional AI model evaluations. Benchmarks like TRAIL, GAIA, and WebArena are emerging to test an agent's ability to complete multi-step tasks, use tools correctly, and recover from errors. These evaluations often require human-annotated traces of an agent's reasoning and actions. - For AI infrastructure startups, the go-to-market strategy is shifting from traditional sales funnels to "intelligent systems" that use AI for market analysis, messaging, and identifying buyer intent. Gartner predicts that by 2026, 70% of startups will adopt AI-driven GTM tools to increase speed and precision. - The fundraising climate for AI startups has seen a significant shift towards capital efficiency and defensibility. While global AI funding reached over $202 billion in 2025, investors are now scrutinizing burn rates and infrastructure costs more heavily as the "growth at all costs" era ends. - The role of a data labeler is evolving from a low-skill gig worker to a high-skill "AI tutor" with domain expertise. As AI models tackle more complex tasks like medical diagnosis and legal analysis, the demand for specialists who can provide nuanced, context-rich feedback is increasing. This creates opportunities for new career pathways, with data labelers advancing to roles like quality control analyst and AI trainer.

Data Quality Cited as Top Reason for AI Project Failure

Get your own daily briefing