India plans 'data city' as OpenAI, Google expand

India is positioning itself as a major AI hub, with plans to build a massive "data city" to serve as foundational infrastructure. The project is reportedly anchored by a $15 billion deal with Google. OpenAI's CEO has also been advocating for a "Democratic AI" approach in the country, seeking government and enterprise partnerships to leverage India's talent pipeline.

- The planned "data city" will be located in Visakhapatnam, a port city in the state of Andhra Pradesh, and is part of a larger initiative involving 760 projects with investment agreements totaling $175 billion. The vision extends beyond just data centers to create a self-sustaining digital ecosystem across a 100-kilometer radius, including server manufacturing and advanced cooling systems. To attract major investors, the state is offering land at highly subsidized rates. - A joint venture between Reliance Industries, Brookfield Asset Management, and Digital Realty is also investing $11 billion to develop a separate AI-focused data center in Visakhapatnam. This aggressive infrastructure build-out is part of India's strategy to narrow the AI gap with the United States and China. The city is also being developed as a key landing point for submarine internet cables connecting India to Singapore, which will significantly improve international connectivity. - OpenAI's "Democratic AI" strategy in India focuses on three pillars: ensuring broad access to AI tools regardless of income or education, driving adoption in sectors like schools and small businesses, and building "AI literacy" to empower users. India has become OpenAI's second-largest user base after the U.S., with students being the largest user group of ChatGPT worldwide. The company opened its first Indian office in Delhi in August 2025 and plans to expand its team and partnerships with the Indian government. - AI labs are increasingly using a technique called Constitutional AI to align models with human values, reducing the dependency on extensive human feedback. This method involves providing the AI with a "constitution" or a set of principles to self-critique and revise its own outputs, which helps to automate and scale the alignment process. This is a shift from traditional Reinforcement Learning from Human Feedback (RLHF), which can be a bottleneck due to its reliance on human labelers for every correction. - For emerging agentic AI systems—which can plan, reason, and act—evaluation is moving beyond simple accuracy metrics. Labs now use a combination of synthetic task benchmarks, replays of real historical tasks, and human-in-the-loop feedback to measure performance across dimensions like task success rate, cost, latency, and decision-making quality. This creates a need for more nuanced data that can validate the entire reasoning process of an AI agent, not just the final output. - While synthetic data is faster and more cost-effective for training AI on a large scale, it often lacks the nuance and accuracy for context-sensitive tasks that human-labeled data provides. Research shows that a hybrid approach is often most effective; models trained primarily on synthetic data can see significant performance improvements by incorporating even small amounts of high-quality, human-labeled data. Human annotators remain crucial for handling ambiguity, mitigating bias, and providing the domain expertise that algorithms cannot replicate. - The fundraising climate for AI startups remains strong, with AI companies raising a third of all venture capital. However, investors are becoming more selective, favoring companies with clear product-market fit and scalable technology. Seed-stage AI startups, in particular, are seeing significantly higher valuations than their non-AI counterparts, with a median pre-money valuation 42% higher in 2024. - Data quality is a primary bottleneck in AI training pipelines, with most AI/ML project failures rooted in poor data rather than flawed models. This forces data science teams to spend a significant amount of time cleaning and preparing data, which slows down development cycles and increases costs as expensive GPUs sit idle. Establishing clear ownership and embedding quality checks throughout the data lifecycle are critical to building reliable AI systems.

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.