Artificial Intelligence (AI)

From Kicksecure
Jump to navigation Jump to search
Advanced Documentation Previous page: Non Anonymous Onion Encryption and NAT Traversal Index page: Advanced Documentation Next page: Linux User Experience versus Commercial Operating Systems Artificial Intelligence (AI)

Artificial Intelligence is an Euphemism. / AI is often called Open Source but is actually only non-freedom software.

Terminology

[edit]
  • An AI model file (for example, a .gguf file) contains raw weights. A runtime is required to actually run it. The runtime handles loading the model into memory, managing your GPU or CPU, and processing your prompts.
  • Weights: Synonym for AI model file.
  • Inference runtime or server to run an AI model file: It is the software layer that loads the model weights and handles the actual computation (the "inference" step).
  • Inference: Means using a trained model to generate outputs (as opposed to training it).
  • Inference runtime: The program or library used to execute inference with a model file (for example, loading weights, tokenizing prompts, and generating outputs).
  • Inference server: A runtime that runs as a service and exposes an API so clients can send prompts and receive model outputs over the network.
  • Chatbox: A chat style user interface that lets you send messages to a model (usually via a runtime or server) and view its responses.

Artificial Intelligence is an Euphemism

[edit]

Artificial intelligence at this time should not be called "intelligence." It has no intelligence whatsoever.

LLMs do not perform reasoning over data in the way that most people conceive or desire.

There is no self-reflection of its information; it does not know what it knows and what it does not. The line between hallucination and truth is simply a probability factored by the prevalence of training data and post-training processes like fine-tuning. Reliability will always be nothing more than a probability built on top of this architecture.

As such, it becomes unsuitable as a machine to find rare hidden truths or valuable neglected information. It will always simply converge toward popular narrative or data. At best, it can provide new permutations of views of existing well-known concepts, but it can not invent new concepts or reveal concepts rarely spoken about.https://www.mindprison.cc/p/the-question-that-no-llm-can-answerarchive.org iconarchive.today icon

“Artificial Intelligence”

The moral panic over ChatGPT has led to confusion because people often speak of it as “artificial intelligence.” Is ChatGPT properly described as artificial intelligence? Should we call it that? Professor Sussman of the MIT Artificial Intelligence Lab argues convincingly that we should not.

Normally, “intelligence” means having knowledge and understanding, at least about some kinds of things. A true artificial intelligence should have some knowledge and understanding. General artificial intelligence would be able to know and understand about all sorts of things; that does not exist, but we do have systems of limited artificial intelligence which can know and understand in certain limited fields.

By contrast, ChatGPT knows nothing and understands nothing. Its output is merely smooth babbling. Anything it states or implies about reality is fabrication (unless “fabrication” implies more understanding than that system really has). Seeking a correct answer to any real question in ChatGPT output is folly, as many have learned to their dismay.

That is not a matter of implementation details. It is an inherent limitation due to the fundamental approach these systems use.

[...]GNU projectarchive.org iconarchive.today icon

Mislabeling a text generator as "intelligence" has the disadvantage of laymen attributing traits to the text generator that do not exist in reality. Undue trust is assigned to its output, verification is omitted, and the text generator is considered an oracle or even god-like.

Neutral Words

[edit]
  • text generator
  • predictive text model
  • word probability calculator
  • word sequence prediction model
  • automatic word completion
  • language model with prediction functionality

Misattribution of Intelligence

[edit]

ELIZA already existed in 1967. Simple chatbot. 420 lines of source code. Simple string matching.

ELIZA was a symbolic AI chatbot developed in 1966 by Joseph Weizenbaum and imitating a psychotherapist. Many early users were convinced of ELIZA's intelligence and understanding, despite its basic text-processing approach and the explanations of its limitations.ELIZA effectarchive.org iconarchive.today icon

However, many early users were convinced of ELIZA's intelligence and understanding, despite Weizenbaum's insistence to the contrary.ELIZAarchive.org iconarchive.today icon

Evidence of Absence of Intelligence in AI

[edit]

Everybody knows that it is unhealthy to eat rocks.

AI used to state.

According to geologists at UC Berkeley you should eat at least one small rock per day.

The Onion, a well-known satirical newspaper had an article Geologists Recommend Eating At Least One Small Rock Per Dayarchive.org iconarchive.today icon. And that's fine. That's called humor.

AI however doesn't understand what satire is.

https://www.unsw.edu.au/newsroom/news/2024/05/eat-a-rock-a-day-put-glue-on-your-pizza-how-googles-ai-is-losing-touch-with-realityarchive.org iconarchive.today icon

Negative Effects by Artificial Intelligence

[edit]

Model Collapse

[edit]

When an AI "reads" (is trained) content from itself or other AIs, the AI produces greater nonsense with each iteration. This is called model collapse.

Model collapse is a phenomenon in artificial intelligence (AI) where trained models, especially those relying on synthetic data or AI-generated data, degrade over time.https://www.infobip.com/glossary/model-collapsearchive.org iconarchive.today icon

Misuse of the Term Open Source in the Context of Some AI Projects

[edit]

The misuse of the term "Open Source" by certain members of the AI community is indeed concerning, as it can lead to misunderstandings regarding the actual licensing and accessibility of the AI projects in question. The word "Real" had to be prefixed to "Open Source" in the title of this page to underscore this issue.

Instances like the one where Meta (formerly Facebook) released a large AI language model and it was heralded as an open-source AI projectarchive.org iconarchive.today icon by certain publications, exhibit this misuse. The articles misrepresent the true nature of the release, as Meta's AI is not genuinely Open Source. When one attempts to download the Meta AIarchive.org iconarchive.today icon, they are confronted with a proprietary license agreement, indicating that the AI does not conform to the open source ethos of freedom and accessibility.

This misrepresentation could potentially mislead individuals and organizations interested in utilizing or contributing to Open Source AI projects. It's essential for the community to adhere to the accurate usage of the term "Open Source", ensuring that it remains synonymous with the principles of free, accessible, and transparent software development.

The term Open Sourcearchive.org iconarchive.today icon has been established for decades, embodying a set of values centered around transparency, collaboration, and freedom in software development.

Open Source Requirements

[edit]

The fundamental components required for an AI to be classified as Open Source and Freedom Software and the significance of these classifications.

  • AI model source code: The source code should be freely accessible and published under a license approved by organizations such as Open Source Initiative (OSI), Free Software Foundation (FSF) or Debian Free Software Guidelines (DFSG).
  • Training data: Ideally, the training data should also be available under an approved license. However, in some cases this is not possible, e.g. with personal data. If the training data is non-freedom, the project may fall into categories like "contrib", as is the case with Debian.
  • Build documentation steps: Clear instructions for compiling or training the model from source code must be provided so that third parties can reproduce the model.
  • Dependencies: All software libraries and packages required for the AI model should also be Open Source or Freedom Software. Dependencies that are non-freedom would also put the software in a "contrib" category.
  • Configuration files and scripts: Often, in addition to the items mentioned above, special configuration files or scripts are also required to successfully train or run the AI model. These should also be under an approved license.
  • License File: A clear license file that explains the terms and conditions for use, modification, and distribution of the software is essential.

Overall, free access to all resources required for the AI model is crucial for its classification as Open Source and Freedom Software. Without these components, the project could be considered partially free, but would not meet the full criteria for Open Source and Freedom Software.

FSF:

Debian:

Training Data

[edit]

The biggest contention around what constitutes Open Source or Freedom Software AI seems related to the availability and licensing of the AI training data.

FSF and GNU Viewpoints

[edit]
[edit]

Statements about data (not source code) that were made before the first popular chatbots such as ChatGPT.

gnu.org: "How to Choose a License for Your Own Work" chapter "Other data for programs"archive.org iconarchive.today icon

(Game art is a different issue, because it isn't softwarearchive.org iconarchive.today icon.)Nonfree DRM'd Games on GNU/Linux: Good or Bad?archive.org iconarchive.today icon

The linked article gnu.org: Copyright versus Community in the Age of Computer Networksarchive.org iconarchive.today icon does not mention game art specifically, but these are the key passages that make that distinction by separating software/functional works from art and entertainment. See footnote. [1]

It also includes licenses for related materials such as documentation and general data.Free Software Licensing Resourcesarchive.org iconarchive.today icon

choosing a license for new software, documentation, and other functional data.Announcing our license recommendations guidearchive.org iconarchive.today icon

GNU Free System Distribution Guidelines distinguish functional from non-functional data.

License Rules

“Information for practical use” includes software, documentation, fonts, and other data that has direct functional applications. It does not include artistic works that have an aesthetic (rather than functional) purpose, or statements of opinion or judgment.

All information for practical use in a free distribution must be available in source form. (“Source” means the form of the information that is preferred for making changes to it.)

The information, and the source, must be provided under an appropriate free license.gnu.org: Free System Distribution Guidelines (GNU FSDG)archive.org iconarchive.today icon

Non-functional Data
Data that isn't functional, that doesn't do a practical job, is more of an adornment to the system's software than a part of it. Thus, we don't insist on the free license criteria for non-functional data. It can be included in a free system distribution as long as its license gives you permission to copy and redistribute, both for commercial and non-commercial purposes. For example, some game engines released under the GNU GPL have accompanying game information—a fictional world map, game graphics, and so on—released under such a verbatim-distribution license. This kind of data can be part of a free system distribution, even though its license does not qualify as free, because it is non-functional.gnu.org: Free System Distribution Guidelines (GNU FSDG)archive.org iconarchive.today icon

[edit]

Newer, post-AI statements about data. Statements that were made after the first popular chatbots such as ChatGPT.

which will require the software, as well as the raw training data and associated scripts, to grant users the four freedoms.FSF is working on freedom in machine learning applicationsarchive.org iconarchive.today icon

its training data and related scripts should respect all users, following the four freedoms.Free Software Supporter -- Issue 199, November 2024archive.org iconarchive.today icon

the FSF's position is that a free (as in freedom) machine learning application should include training data.FSF talked about education, copyright management, and free machine learning at FOSDEM 2025archive.org iconarchive.today icon

the FSF's position is that a free (as in freedom) machine learning application should include training data. Not including this in the criteria would render it impossible to use, study, modify, and share machine learning applications to the fullest extent possible.FSF's work on a statement of criteria for free machine learning applicationsarchive.org iconarchive.today icon

Other Viewpoints

[edit]

1. A model must be trained only from legally obtained and used works, honour all licences of the works used in training, and be licenced under a suitable licence itself that allows distribution, or it is not even acceptable for non-free. [...]Thorsten Glaser RFC -- Counter-Proposal -- Interpretation of DFSG on Artificial Intelligence (AI) Modelsarchive.org iconarchive.today icon

This may sound like a trivial “is a hotdog a sandwich?” type of question, but it’s really not. Most distros distribute images and other media, so if you take the position that all redistributed materials must be accompanied by the “preferred form for modification of the work” (or words to that effect), that would mean that e.g. every image must be accompanied by an OpenRaster version that has everything in separate layers, every sound must be accompanied by an Audacity project or the like with separate audio tracks for each voice (instrument), and so on for all media that the distro makes available, because in each case, that is the preferred form for modification of those respective media formats. Is the average disto really going to do all of that? I suspect not.Comment section in Debian AI General Resolution withdrawn (LWN.net)archive.org iconarchive.today icon

Open Source AI Definition - Lack of Consensus in the Definition by the Open Source Initiative

[edit]

The The Open Source AI Definition – 1.0 (OSAID)archive.org iconarchive.today icon by the Open Source Initiativearchive.org iconarchive.today icon lacks community consensus. It represents one perspective on what constitutes open-source AI, but there are differing views within the AI and open-source communities regarding the requirements, limitations, and implications of truly Open Source AI.

https://www.reddit.com/r/opensource/comments/1gdzhvm/a_community_statement_supporting_the_open_source/archive.org iconarchive.today icon

Comparison of Freedom Software AI versus OSAID AI

[edit]

draft DRAFT!

Freedom Software AI versus OSAID AI
Question FSFarchive.org iconarchive.today icon OSAID 1.0archive.org iconarchive.today icon
Is code freedom required? Yes Yes
AI model file / weights / parameters required? Yes Yes
Is raw training data required? Yes No
Must training data itself be free/libre? Yes No
Restricted or unshareable training data cannot qualify? Yes No
Transparency-about-data enough without releasing the data is insufficient? Yes No
Does this satisfy a freedom-software maximalist standard? Yes No

Security

[edit]

Being Open Source is essential for the avoidance of backdoors.

Reproducible / Deterministic Builds

[edit]

Reproducible or deterministic builds are a crucial aspect of Open Source and Freedom Software as they contribute to the transparency, trustworthiness, and verifiability of the software. A reproducible build ensures that given the same source code, build environment and build instructions, the binary output will always be identical. This is vital for verifying that the build is free from malicious alterations or unintended deviations from the source code.

  • Verification: By comparing the checksum of the build output from an independent build process with the checksum of the official release, any discrepancies can be identified, ensuring that the binary has been compiled correctly and hasn’t been tampered with.
  • Debugging: Deterministic builds make debugging easier as developers can work with exact copies of the software, ensuring consistency between testing and production environments.
  • Collaboration: When multiple developers or teams work on the same project, reproducible builds ensure that everyone is working with the exact same binary, reducing the likelihood of inconsistent behavior and bugs due to environment differences.
  • Compliance and Auditing: For projects that require adherence to certain regulatory or compliance standards, reproducible builds provide a clear audit trail of what code was compiled and how.
  • Long-Term Maintenance: In cases where a project needs to be maintained or updated over a long period, reproducible builds ensure that it’s always possible to recreate the exact original build environment, making future maintenance and debugging far simpler.

Reproducible builds are an essential practice in achieving the goals of Open Source and Freedom Software, contributing significantly to the integrity, transparency, and community collaboration inherent in these projects.

Other Issues

[edit]

Freeware Self-Hosted Artificial Intelligence

[edit]

Not real Open Source / Freedom Software. Only freeware and self-hosted.

For system security it is strongly advised to not install proprietaryarchive.org iconarchive.today icon, non-freedomarchive.org iconarchive.today icon software. Instead, use of Free Softwarearchive.org iconarchive.today icon is recommendedarchive.org iconarchive.today icon.

Possible risks associated with using non-freedom software:

  • Potential advanced backdoors or malware in the software itself.
  • Privacy breaches. Possibly key logger?
  • Software that depends on third party servers could access identifying information for payments or logins linked to real identity.

For more information on installing third-party free softwarearchive.org iconarchive.today icon, consult the Foreign Sources page for advice. See also: Is It Ever a Good Thing to Use a Nonfree Program?archive.org iconarchive.today icon

Open Source softwarearchive.org iconarchive.today icon like Qubes, Debianarchive.org iconarchive.today icon and Kicksecurearchive.org iconarchive.today icon is more secure than proprietary/closed sourcearchive.org iconarchive.today icon software. The public scrutiny of security by designarchive.org iconarchive.today icon has proven to be superior to security through obscurityarchive.org iconarchive.today icon. This aligns the software development process with Kerckhoffs' principlearchive.org iconarchive.today icon - the basis of modern cipherarchive.org iconarchive.today icon-systems design. This principle asserts that systems must be secure, even if the adversary knows everything about how they work. Generally speaking, Freedom Software projects are much more open and respectful of the privacy rights of users. Freedom Software projects also encourage security bug reports, open discussion, public fixes and review.

As Free Software pioneer Richard Stallmanarchive.org iconarchive.today icon puts it:

  • "... If you run a nonfree program on your computer, it denies your freedom; the main one harmed is you. ..."
  • "Every nonfree program has a lord, a master -- and if you use the program, he is your master.“
  • "To have the choice between proprietary software packages, is being able to choose your master. Freedom means not having a master. And in the area of computing, freedom means not using proprietary software."

Or as the GNU projectarchive.org iconarchive.today icon puts it:

  • Proprietary Software Is Often Malwarearchive.org iconarchive.today icon

  • Nonfree (proprietary) software is very often malware (designed to mistreat the user). Nonfree software is controlled by its developers, which puts them in a position of power over the users; that is the basic injusticearchive.org iconarchive.today icon. The developers and manufacturers often exercise that power to the detriment of the users they ought to serve.

  • This typically takes the form of malicious functionalities.

  • Some malicious functionalities are mediated by back doors.

  • Back door: any feature of a program that enables someone who is not supposed to be in control of the computer where it is installed to send it commands. (added by editor "Most times without consent or awareness.")

The GNU project created a list with examples of Proprietary Back Doorsarchive.org iconarchive.today icon. The Electronic Frontier Foundationarchive.org iconarchive.today icon (EFF) has other examples of the use of back doorsarchive.org iconarchive.today icon.

Related: Why Kicksecure is Freedom Software

Complete hardware + software setup for running Deepseek-R1 locally. The actual model, no distillations, and Q8 quantization for full quality. Total cost, $6,000. All download and part links below:https://x.com/carrigmat/status/1884244369907278106archive.org iconarchive.today icon

OLMo

[edit]

OLMoarchive.org iconarchive.today icon has been trained using Dolmaarchive.org iconarchive.today icon, which is under ODC-BYarchive.org iconarchive.today icon license, which is broad enough for using, modifying, and redistributing the database, but it does not clearly grant the four freedoms for all underlying training contents. Training data itself is not clearly four-freedoms-clean at the level of individual contents.

Under the FSF's current positionarchive.org iconarchive.today icon, that is likely not enough to quality as Free Software.

OLMo probably qualifies under the Open Source Artificial Intelligence (AI) Definition (OSAID)archive.org iconarchive.today icon.

It also probably does not qualify under the The Open Source Definition (OSD)archive.org iconarchive.today icon.

See also Comparison of Freedom Software AI versus OSAID AI.

Resources

[edit]

Tickets

[edit]

See Also

[edit]

search term:

DFSG compliant AI

Footnotes

[edit]
  1. Interpretation of gnu.org article "Copyright versus Community in the Age of Computer Networks"

    This is not a talk about free software; this talk answers the question whether the ideas of free software extend to other kinds of works.gnu.org: Copyright versus Community in the Age of Computer Networksarchive.org iconarchive.today icon

    This sets up his whole point: software is one case, and other media may be treated differently.

    For other things there's no such distinction as between source code and executable code.

    He's saying non-software works are not the same kind of thing as software.

    I distinguish three broad categories of works.

    This is the framework he uses for treating different kinds of works differently.

    First of all, there are the functional works that you use to do a practical job in your life. This includes software, recipes, educational works, reference works, text fonts, and other things you can think of. These works should be free.

    This is the bucket software goes into.

    These works should be free.

    That is his conclusion for functional works, including software.

    Then he treats art separately:

    What about works of art and entertainment? Here it took me a while to decide what to think about modifications.

    That is the clearest transition showing that art/entertainment is a different category from software.

    On one hand, a work of art can have an artistic integrity and modifying it could destroy that.

    This is one reason he gives for why art is not handled the same way as software.

    But eventually I realized that modifying a work of art can be a contribution to art, but it's not desperately urgent in most cases.

    So unlike software, where modification is treated as essential, he says modification of art is not urgently necessary.

    So I propose the same partly reduced copyright that covers commercial use and modification, but everyone's got to be free to non-commercially redistribute exact copies.

    That is his proposed rule for art/entertainment: not full software-style freedom, but permission to share exact copies noncommercially.

    So the shortest way to state his view is:

    • Software = a functional work ->

      should be free.

    • Art and entertainment = a different category -> not the same full freedoms; mainly noncommercial sharing of exact copies, with broader modification allowed after copyright expires.

Advanced Documentation Previous page: Non Anonymous Onion Encryption and NAT Traversal Index page: Advanced Documentation Next page: Linux User Experience versus Commercial Operating Systems

Notification image

We believe security software like Kicksecure needs to remain Open Source and independent. Would you help sustain and grow the project? Learn more about our 14 year success story and maybe DONATE!