Recently, GitHub introduced the change in how atx headers are parsed in Markdown files.
##Wrong
While this change follows the spec, it breaks many existing repositories. I took the README dataset which we created at source{d} and ran a simple regexp PySpark job. It appeared that more than 500,000 repositories have README files which are rendered with invalid headers.
Among those 0.5mm, there are more than 10,000 which have more than 50 stars. They were uploaded to data.world.
Maybe include the number of repos that are not forks?