Understanding Trust and Safety in AI from Code to Creativity

The Software Freedom Law Centre, in collaboration with Vishwam.ai, Free and Open Source Software and the Linux Foundation organised an event consisting of consultations between stakeholders under the title 'Understanding Trust and Safety in AI from: Code to Creativity.   This began with an introduction to two round table discussions on harvesting open source AI and on AI innovation and copyright. The opening remarks were delivered by the director of IIIT Hyderabad Professor Sandeep Kumar Shukla. A chief concern he raises is the findings from a conference on neural information processing systems where there was evidence of hallucinatory references found in papers that were in all likelihood the result of an improperly developed AI system. Another issue that was raised was the question of which data sets AI has been trained on and whether there are biases in there.

Subsequently, the founder of Swecha Kiran Chandra said that data is the key driver and that it requires to be verifiable, and that the owner of the data should be acknowledged. Further, he presented that the questions which are required to be considered are how do we protect the interests of those who produce the data and how does that data contribute back to society.

Mishi Choudhary the erstwhile legal director and founder of the Software Freedom Law Centre, Addressed the Gen AI Copyright Report, specifically critiquing the Department of Promotion of Industry and International Trade’s proposal for a "hybrid licensing model." She argued that it removes creator autonomy by legitimising data scraping under the guise of innovation.

Poruri Sai Rahul the Chief Executive Officer of the Free and Open Source Software United represented the developer community's interest in maintaining open weights and transparent methodologies, especially when public money is used for AI research.

Before addressing the conference itself, consisting of the two round table discussions, we should mention a concerning problem that appears which is that of dark patterns, which are deceitful interfaces that try and solicit users into buying overpriced insurance or signing up for recurring bills. Regarding these, the Electronic Frontier Foundation and Consumer Report created a tip line to collect information about dark patterns and to provide them to the public.

In a panel discussion Professor Rahul De presented his observations regarding how for a while, until just six months, companies were unwilling to commit to a particular model but this had  recently changed. The other issue which came up is the issue of transparency which could potentially mitigate risks by providing for the opportunity to researchers and authorities to audit model performance, as well as the need for incentives for these audits to be conducted. There is also the risk that open source AI if dispersed without restriction may potentially fall into the hand of bad actors. Further, the scale of investment in AI particularly in research and development in India has not kept pace with developments in China. The worry was also presented that developments in India might be reduced to API wrapper companies. There is also the issue of how AI is to be regulated. What if AI were to be tasked to learn something wrong? What of the possibility of AI companies copying data?

It is primarily to address these latter concerns that a new Open Language Justice Licence has been created which seeks to meet the challenges of the responsible attribution of linguistic data and its relation to AI development. There was an evaluation of existing regulatory models and an assessment of what may be required in a hybrid model.  What was said to be required was legally ordained access to data so that developers could feed their AI models to learn from, fair compensation to copyright holders, rate setting via a quick transparent process, a mechanism to review these rates, upholding the basic principles of copyright and reward for creativity, and to limit the risk of litigation and dispute. A worrying scenario is that data companies may harvest from the commons, create black box models and not share their weights or parameters with the public so that it does not contribute back to the commons. Having said that there may be advantages and indeed necessities of closed and private channels of communication between businesses in the interest of industry so that competitors do not misuse that data, misrepresent agents, or sell insider information to influence stocks. The verifiability of the origins of the data is key in inculcating data provenance.

The key contributors to the discussion were Nidhi, who managed logistics and timing updates, Ana Enriquez and Jatin Gandhi who tested audio and asked about sessions logistics, Swaraj Paul Baruah who shared copyright sources, and the admin who shared the links for the data licence draft for collaborative review.

A point of deliberation that came up was the relation between journalism and tech companies and how AI has been changing this situation. The question of whether journalism can survive in the present economy was a concern that was brought up and the example was presented of Australia where tech companies actually pay journalists, as a possible model that may make the industry sustainable via synergy.

There was a pronounced interest by the audience in the transcripts and recordings so that the discussion may be revisited.