[...] as many as half [of the projects] include no easily identifiable copyright licensing information. About 30 percent include some sort of licensing information in the source files, and around 20 percent have a clear license or notice file that makes it obvious under what terms the code is made available.Only 20 percent of all surveyed projects include proper licensing information in their repo? Not good, folks! What's sad about Simon's article is that he doesn't provide a link to the survey or further information how the data was acquired.
Since I'm really interested in the topic and wanted to see some numbers I rolled my own "survey" - and since I'm an Open Source Guy™ I will provide all the numbers and stuff for you to reproduce them. Let me say this as a motivator: My numbers don't even roughly match those of the mentioned survey. Let's start with the first chart:
interesting", "popular forked" and "popular starred" GitHub projects that include proper license information in their repo and of those that don't. What does "proper license information" mean? I defined it as this:
The project's repo contains a file named "copying", "copyright" or "*license*" (all case insensitive) in the root folderThis is actually a formalization of common practice in Open Source projects. As you can see in the chart nearly 140 of the 175 projects analyzed contain such an easily findable license information, or more precisely 78%.
the project's repo contains a file named "readme.*" (case insensitive) that again contains a section called "license".
In essence this analysis of 2000 projects reflects the results from the first chart: 72% of all projects provide proper licensing information, with Ruby projects having the best ratio (90,5%) and Perl having the worst ratio (57%):
The conclusion I draw from these numbers is that the situation seems not to be so bad as Simon indicates in his article or as the survey he got purports. Don't get me wrong, I would really urge GitHub to provide a simple mechanism for license selection when creating a project/repo. I cannot believe that it is so hard for users to choose from a set of options. Simon links to an interesting wizard put up by John Cowan that could serve as a template for this effort.
In the end it would serve everyone - the creators as well as consumers of software - to know the implications of using a certain piece of code; and only a properly licensed project serves this purpose. I personally don't get near a project that doesn't make clear under what terms I can use its code.
P.S.: If you don't already know, I'm a coder so, NO, I didn't collect these numbers by hand. Get the simple Python script I used from here if you like to reproduce the graphs from above.