Which Languages use CI Services on Github?

I analyzed 370 million files of 614k Github repos to find out which Continous Integration services developers use. I looked at public Github repos that were recorded in the GHTorrent dataset of 2015-01-29 and are not forks of other projects. The dataset is available at the repostruct Github Repository.

tldr: Travis has won the game but there are still alot of projects that don’t use CI at all.

Codeship doesn’t appear in this analysis even though it is a major player. This is because there is no config file I can look for in the repository.

Javascript

Java

C++

Python

Ruby

Objective-C

PHP

Go

C#

C


CI Integration across Languages


Findings

Languages

  • Users of dynamic languages use CI services the most. Projects written in Javascript, Ruby or PHP use CI services in 30% of the projects.
  • Pythonistas use CI less than their dynamic companions (only 20%). Perhaps this might be because Python is used alot for scriping where testing is not as important.
  • Java developers use CI services in only 10% of projects. That is as bad as C programmers. For a language with the most mature tooling this is astonishing. Perhaps this is because they rely on custom builds with Jenkins or Teamcity.
  • C# projects have not yet arrived the age of CI (less than 5%).
  • Projects with native code like C++ or Objective-C don’t use CI as often, but surprisingly more often than Java. Approximately 15% of projects use CI. They are also more willing to use other CI services than Travis.
  • The Go community is the most interesting. They use alot of services and Drone.io and Wercker start to gain traction among those projects. Go programmers are also exemplary in their usage of CI (40% of projects!)
  • AppVeyor is the only CI service that can compete with travis, at least on a language basis. AppVeyor has some market share in Javascript, Python, C++, C and C# projects due the fact that they offer Windows based builds.

Services

  • Travis is the #1 for Open Source CI integration and it seems unlikely that will change in the future. No one has even close the numbers Travis has across all languages.
  • AppVeyor is well positioned with its Windows based builds and they might get significant market share in languages where platforms matter like C++, C and C#. These languages still have some room to grow because they don’t have a high CI usage yet (especially C#).
  • Drone is mostly used by Go developers. No wonder as it is based on Docker and Go.
  • Close to no one uses SensioLabs or Solano.
  • Wercker and CircleCI are new kids on the block and therefore might be a bit underrepresented in this analysis. I think it will be tough for them to compete with Travis. They need to convince the users creating new repositories. Looking at the Go community where most projects are very new, we see that Travis alternatives have a better chance.

Conclusion

There is still room for more CI services, but given that not all Projects on Github are of serious nature, more than 50% of projects are unlikely to use CI.

My guess is that Travis will remain the big player for Open Source projects in the future followed by AppVeyor. The others will have a little share of the projects even though they have better features than Travis.

Numbers

This is simply an educated guess how many Open Source repositories use which servies. I would be very happy if the people working at those Companies could confirm them.

CI Platform Repos
Travis 200’000+
AppVeyor 50’000+
CircleCI 5’000+
Wercker 5’000+
Drone.io 1’000+

Please note that if Projects don’t use a configuration file I cannot relate them to a CI Platform.

Data Quality

Conclusion: The data gathered is pretty accurate, perhaps not down to the number but the distribution ratio of CI integration services is correct.

I verified the results with results from Google Search and Github Code search. The results differ, but the ratio stays the same (Probably because I didn’t get the whole Github dataset). If you scale Google Search with 0.3 and 1.5 and the Github search with 0.4 you get the same results.

Github has much more records for .travis.yml and appveyor.yml than I found. Reasons might be:

  • Github Code search includes forks
  • A repository can have multiple CI config files (alot of Javascript people have the node_modules folder checked in)
CI config file repostruct Google Search Google Search (intitle) Github (in path)
.travis.yml 15’6597 388’000 69’200 4’044’062
appveyor.yml 5’950 6’780 1’270 54’741
.drone.yml 878 2’720 626 1’084
circle.yml 1’702 8’510 1’150 4’725
wercker.yml 1’828 5’010 764 4’999
solano.yml 21 277 6 26
.sensiolabs.yml 9 3’040 7 31
.scrutinizer.yml 3’877 9’290 3’010 13’550

Thanks

Thanks to @m_st and @sfkeller for providing the monster server for analyzing the results.