Split authorship lines from the right instead of from the left
authorNicolas Dandrimont <olasd@softwareheritage.org>
Fri, 19 Oct 2018 15:44:12 +0000 (17:44 +0200)
committerNicolas Dandrimont <olasd@softwareheritage.org>
Fri, 19 Oct 2018 15:50:56 +0000 (17:50 +0200)
Git authorship lines are in the form 'author Name <em@i.l> timestamp timezone'.
Some clients mess up the 'Name <em@i.l>' part badly, for instance by setting two
email addresses. Splitting identity and timestamp by looking at the '> ' part
from the right instead of the left helps parse some of those messed up commits.

Such commits still fail the check() method (and they raise a warning in git fsck
upstream as well), but we can at least work with them.

(This edge case brought to you by https://forge.softwareheritage.org/T1280)

dulwich/objects.py
dulwich/tests/test_objects.py

index f79643cfc07a1a57370566c419d828007ee86c27..ced6be9115959041e03bf22c0b78fec8a8ea29e4 100644 (file)
@@ -1112,7 +1112,7 @@ def parse_time_entry(value):
     :return: Tuple of (author, time, (timezone, timezone_neg_utc))
     """
     try:
-        sep = value.index(b'> ')
+        sep = value.rindex(b'> ')
     except ValueError:
         return (value, None, (None, False))
     try:
index 05708270a5e6f3323be9ba71eb8f1c32eb002ae5..441f64166f61bc468fa55d68f17f67135634fcfa 100644 (file)
@@ -673,6 +673,27 @@ class CommitParseTests(ShaFileCheckTests):
             with self.assertRaises(ObjectFormatException):
                 commit.check()
 
+    def test_mangled_author_line(self):
+        """Mangled author line should successfully parse"""
+        author_line = (
+            b'Karl MacMillan <kmacmill@redhat.com> <"Karl MacMillan '
+            b'<kmacmill@redhat.com>"> 1197475547 -0500'
+        )
+        expected_identity = (
+            b'Karl MacMillan <kmacmill@redhat.com> <"Karl MacMillan '
+            b'<kmacmill@redhat.com>">'
+        )
+        commit = Commit.from_string(
+            self.make_commit_text(author=author_line)
+        )
+
+        # The commit parses properly
+        self.assertEqual(commit.author, expected_identity)
+
+        # But the check fails because the author identity is bogus
+        with self.assertRaises(ObjectFormatException):
+            commit.check()
+
     def test_parse_gpgsig(self):
         c = Commit.from_string(b"""tree aaff74984cccd156a469afa7d9ab10e4777beb24
 author Jelmer Vernooij <jelmer@samba.org> 1412179807 +0200