Skip to content

Instantly share code, notes, and snippets.

@vmarkovtsev
Last active January 1, 2018 16:53
Show Gist options
  • Save vmarkovtsev/59cd7349d41cf804b9a8775388e681f8 to your computer and use it in GitHub Desktop.
Save vmarkovtsev/59cd7349d41cf804b9a8775388e681f8 to your computer and use it in GitHub Desktop.

Recently, GitHub introduced the change in how atx headers are parsed in Markdown files.

##Wrong

Correct

While this change follows the spec, it breaks many existing repositories. I took the README dataset which we created at source{d} and ran a simple regexp PySpark job. It appeared that more than 500,000 repositories have README files which are rendered with invalid headers.

Among those 0.5mm, there are more than 10,000 which have more than 50 stars. They were uploaded to data.world.

@underyx
Copy link

underyx commented Mar 20, 2017

Maybe include the number of repos that are not forks?

@vmarkovtsev
Copy link
Author

Sorry, forks are excluded from that number, forgot to mention.

@bryant1410
Copy link

Great! I created a script to fix them based on this list: bryant1410/readmesfix!

@DrPaulBrewer
Copy link

DrPaulBrewer commented Apr 18, 2017

cool, thanks much. Those github pages looked good when I made them, and then one day they did not.

It is nice to know I am not going mad.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment