Lost Authors, Trained Machines: Rethinking Copyright for Orphan Works in the Age of AI
In the age of artificial intelligence (AI), massive volumes of digital content including music, text, and images are being utilized to train generative models. Amongst these materials lie orphan works, those are copyrighted materials whose authors or right holders are untraceable despite a diligent search. The inclusion of such works in AI training datasets has sparked complex legal and ethical questions, particularly in India, where copyright law remains silent on the issue. This research paper investigates the little-researched niche of orphan works within the context of AI training in India and considered copyright law issues related to orphan works in AI, in particular. Although the EU and US have tried to develop limited schemes that allow for the lawful use of orphan works, India has yet to adopt any statutory recognition, licensing regime, or a safe harbor for the use of orphan works. The lack of statutory provision to facilitate the use of orphan works leaves developers in a legal quagmire, potentially jeopardizing innovative technologies and India’s ability to share its cultural heritage with the world. By combining doctrinal analysis and comparative legal research, this study traces the historical evolution of the orphan works and their conceptual tensions in copyright jurisprudence. It evaluates the current role of such works in AI training systems, where the lack of attribution and ownership metadata often results in their inadvertent use. The paper then explores best practices globally and identify key gaps in India’s legal framework for orphan works. Beyond the examination of legal framework, the paper raises critical questions: Should AI developers be permitted to use orphan works in training AI? Is their inclusion an act of exploitation, or can it be justified under innovation and public interest? The research ultimately calls for a nuanced Indian framework that equitably balances the rights of unknown creators and the imperatives of a modern context. It recommends a statutory model for the lawful use of orphan works for AI training and digital archiving purposes, founded on ethical and constitutional principles. As it does so, it proposes an ethics-based, pragmatic roadmap that connects copyright law with emerging technologies and a justice-based policy rationale.