Real Open Source Artificial Intelligence (AI)
The fundamental components required for an AI to be classified as Open Source and Freedom Software and the significance of these classifications.
Open Source Requirements
- AI model source code: The source code should be freely accessible and published under a license approved by organizations such as OSI, FSF or DSFG.
- Training data: Ideally, the training data should also be available under an approved license. However, in some cases this is not possible, e.g. with personal data. If the training data is non-freedom, the project would fall into categories like "contrib", as is the case with Debian.
- Build documentation steps: Clear instructions for compiling or training the model from source code must be provided so that third parties can reproduce the model.
- Dependencies: All software libraries and packages required for the AI model should also be Open Source or Freedom Software. Dependencies that are non-freedom would also put the software in a "contrib" category.
- Configuration files and scripts: Often, in addition to the items mentioned above, special configuration files or scripts are also required to successfully train or run the AI model. These should also be under an approved license.
- License File: A clear license file that explains the terms and conditions for use, modification, and distribution of the software is essential.
Overall, free access to all resources required for the AI model is crucial for its classification as Open Source and Freedom Software. Without these components, the project could be considered partially free, but would not meet the full criteria for Open Source and Freedom Software.
Reproducible / Deterministic Builds
Reproducible or deterministic builds are a crucial aspect of Open Source and Freedom Software as they contribute to the transparency, trustworthiness, and verifiability of the software. A reproducible build ensures that given the same source code, build environment and build instructions, the binary output will always be identical. This is vital for verifying that the build is free from malicious alterations or unintended deviations from the source code.
- Verification: By comparing the checksum of the build output from an independent build process with the checksum of the official release, any discrepancies can be identified, ensuring that the binary has been compiled correctly and hasn’t been tampered with.
- Debugging: Deterministic builds make debugging easier as developers can work with exact copies of the software, ensuring consistency between testing and production environments.
- Collaboration: When multiple developers or teams work on the same project, reproducible builds ensure that everyone is working with the exact same binary, reducing the likelihood of inconsistent behavior and bugs due to environment differences.
- Compliance and Auditing: For projects that require adherence to certain regulatory or compliance standards, reproducible builds provide a clear audit trail of what code was compiled and how.
- Long-Term Maintenance: In cases where a project needs to be maintained or updated over a long period, reproducible builds ensure that it’s always possible to recreate the exact original build environment, making future maintenance and debugging far simpler.
Reproducible builds are an essential practice in achieving the goals of Open Source and Freedom Software, contributing significantly to the integrity, transparency, and community collaboration inherent in these projects.
Misuse of the Term Open Source in the Context of Some AI Projects
The misuse of the term "Open Source" by certain members of the AI community is indeed concerning, as it can lead to misunderstandings regarding the actual licensing and accessibility of the AI projects in question. The word "Real" had to be prefixed to "Open Source" in the title of this page to underscore this issue.
Instances like the one where Meta (formerly Facebook) released a large AI language model and it was heralded as an open-source AI project by certain publications, exhibit this misuse. The articles misrepresent the true nature of the release, as Meta's AI is not genuinely Open Source. When one attempts to download the Meta AI, they are confronted with a proprietary license agreement, indicating that the AI does not conform to the open source ethos of freedom and accessibility.
This misrepresentation could potentially mislead individuals and organizations interested in utilizing or contributing to Open Source AI projects. It's essential for the community to adhere to the accurate usage of the term "Open Source", ensuring that it remains synonymous with the principles of free, accessible, and transparent software development.
The term Open Source has been established for decades, embodying a set of values centered around transparency, collaboration, and freedom in software development.
- Brief update about software freedom and artificial intelligence
- ML-Policy: Unofficial Policy for Debian & Machine Learning
- OSI (Open Source Iniative) report What does it mean for an AI system to be Open Source?
- Rethink about who we (Debian) are in rapid dev cycle of deep learning
- Quote "Dolly 2.0, the first open source, instruction-following LLM". But is it really Open Source? Hence ticket step by step instructions on how to build this AI from source code was created.