[...] as many as half [of the projects] include no easily identifiable copyright licensing information. About 30 percent include some sort of licensing information in the source files, and around 20 percent have a clear license or notice file that makes it obvious under what terms the code is made available.Only 20 percent of all surveyed projects include proper licensing information in their repo? Not good, folks! What's sad about Simon's article is that he doesn't provide a link to the survey or further information how the data was acquired.
Since I'm really interested in the topic and wanted to see some numbers I rolled my own "survey" - and since I'm an Open Source Guy™ I will provide all the numbers and stuff for you to reproduce them. Let me say this as a motivator: My numbers don't even roughly match those of the mentioned survey. Let's start with the first chart:
Here you see a comparison of the total number of "interesting", "popular forked" and "popular starred" GitHub projects that include proper license information in their repo and of those that don't. What does "proper license information" mean? I defined it as this:
The project's repo contains a file named "copying", "copyright" or "*license*" (all case insensitive) in the root folderThis is actually a formalization of common practice in Open Source projects. As you can see in the chart nearly 140 of the 175 projects analyzed contain such an easily findable license information, or more precisely 78%.
OR
the project's repo contains a file named "readme.*" (case insensitive) that again contains a section called "license".
But I wanted to dive a little bit deeper and analyze more projects; GitHub hosts nearly 5 million repos, after all. Since you cannot easily get a list of all repos I took this approach: Get a list of the most popular programming languages from here and get the most watched projects for each of them (this is the one for JavaScript). For each of the listed projects (200 per language) I applied the search criteria from above and got the following results:
In essence this analysis of 2000 projects reflects the results from the first chart: 72% of all projects provide proper licensing information, with Ruby projects having the best ratio (90,5%) and Perl having the worst ratio (57%):
The conclusion I draw from these numbers is that the situation seems not to be so bad as Simon indicates in his article or as the survey he got purports. Don't get me wrong, I would really urge GitHub to provide a simple mechanism for license selection when creating a project/repo. I cannot believe that it is so hard for users to choose from a set of options. Simon links to an interesting wizard put up by John Cowan that could serve as a template for this effort.
In the end it would serve everyone - the creators as well as consumers of software - to know the implications of using a certain piece of code; and only a properly licensed project serves this purpose. I personally don't get near a project that doesn't make clear under what terms I can use its code.
P.S.: If you don't already know, I'm a coder so, NO, I didn't collect these numbers by hand. Get the simple Python script I used from here if you like to reproduce the graphs from above.



Couldn't it be that those users just don't care about how their code is used? Because i.e. it's a small repo whose code they are not very proud of, etc.
ReplyDeletewodim: It can be totally okay to not include licensing terms in your code, esp. when it's just dot files or sth. else with low "threshold of originality".
DeleteBut: As someone who wants to use code from someone else it's indispensable to know exactly the usage terms, because you don't want to be sued 5 years later for copyright infringement (Unix, anyone?). Thus the conclusion is that as an author, if you want others to use your code under certain conditions, you _must_ include license terms because otherwise no sane programmer will use your code, esp. in a business context.
I agree that it is important to make licensing terms clear for open source code. If you release code as open source you are clearly intending for others to be able to use it, and then you need to be clear about under what terms it can be used.
ReplyDeleteI found that developer support for handling licensing information are supported vaguely by some IDEs and not at all by others. That is why (and now I'm going to push for one of my open source projects :-)) I created https://github.com/tombensve/CodeLicenseManager. It is written in Java and while it can be used for any language project as long as a JVM is available, it shines best when used with Java & Maven where it can resolve most third party licenses from dependency poms. Here is an example of a markdown document generated by CodeLicenseManager: https://github.com/tombensve/CodeLicenseManager/blob/master/CodeLicenseManager-documentation/docs/licenses.md. It also updates license boilerplate comments in your code.