diff.c: color moved lines differently

When a patch consists mostly of moving blocks of code around, it can
be quite tedious to ensure that the blocks are moved verbatim, and not
undesirably modified in the move. To that end, color blocks that are
moved within the same patch differently. For example (OM, del, add,
and NM are different colors):

    [OM]  -void sensitive_stuff(void)
    [OM]  -{
    [OM]  -        if (!is_authorized_user())
    [OM]  -                die("unauthorized");
    [OM]  -        sensitive_stuff(spanning,
    [OM]  -                        multiple,
    [OM]  -                        lines);
    [OM]  -}

           void another_function()
           {
    [del] -        printf("foo");
    [add] +        printf("bar");
           }

    [NM]  +void sensitive_stuff(void)
    [NM]  +{
    [NM]  +        if (!is_authorized_user())
    [NM]  +                die("unauthorized");
    [NM]  +        sensitive_stuff(spanning,
    [NM]  +                        multiple,
    [NM]  +                        lines);
    [NM]  +}

However adjacent blocks may be problematic. For example, in this
potentially malicious patch, the swapping of blocks can be spotted:

    [OM]  -void sensitive_stuff(void)
    [OM]  -{
    [OMA] -        if (!is_authorized_user())
    [OMA] -                die("unauthorized");
    [OM]  -        sensitive_stuff(spanning,
    [OM]  -                        multiple,
    [OM]  -                        lines);
    [OMA] -}

           void another_function()
           {
    [del] -        printf("foo");
    [add] +        printf("bar");
           }

    [NM]  +void sensitive_stuff(void)
    [NM]  +{
    [NMA] +        sensitive_stuff(spanning,
    [NMA] +                        multiple,
    [NMA] +                        lines);
    [NM]  +        if (!is_authorized_user())
    [NM]  +                die("unauthorized");
    [NMA] +}

If the moved code is larger, it is easier to hide some permutation in the
code, which is why some alternative coloring is needed.

This patch implements the first mode:
* basic alternating 'Zebra' mode
  This conveys all information needed to the user.  Defer customization to
  later patches.

First I implemented an alternative design, which would try to fingerprint
a line by its neighbors to detect if we are in a block or at the boundary.
This idea iss error prone as it inspected each line and its neighboring
lines to determine if the line was (a) moved and (b) if was deep inside
a hunk by having matching neighboring lines. This is unreliable as the
we can construct hunks which have equal neighbors that just exceed the
number of lines inspected. (Think of 'AXYZBXYZCXYZD..' with each letter
as a line, that is permutated to AXYZCXYZBXYZD..').

Instead this provides a dynamic programming greedy algorithm that finds
the largest moved hunk and then has several modes on highlighting bounds.

A note on the options '--submodule=diff' and '--color-words/--word-diff':
In the conversion to use emit_line in the prior patches both submodules
as well as word diff output carefully chose to call emit_line with sign=0.
All output with sign=0 is ignored for move detection purposes in this
patch, such that no weird looking output will be generated for these
cases. This leads to another thought: We could pass on '--color-moved' to
submodules such that they color up moved lines for themselves. If we'd do
so only line moves within a repository boundary are marked up.

It is useful to have moved lines colored, but there are annoying corner
cases, such as a single line moved, that is very common. For example
in a typical patch of C code, we have closing braces that end statement
blocks or functions.

While it is technically true that these lines are moved as they show up
elsewhere, it is harmful for the review as the reviewers attention is
drawn to such a minor side annoyance.

For now let's have a simple solution of hardcoding the number of
moved lines to be at least 3 before coloring them. Note, that the
length is applied across all blocks to find the 'lonely' blocks
that pollute new code, but do not interfere with a permutated
block where each permutation has less lines than 3.

Helped-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This commit is contained in:
Stefan Beller
2017-06-30 13:53:07 -07:00
committed by Junio C Hamano
parent e6e045f803
commit 2e2d5ac184
3 changed files with 602 additions and 15 deletions

View File

@@ -972,4 +972,265 @@ test_expect_success 'option overrides diff.wsErrorHighlight' '
'
test_expect_success 'detect moved code, complete file' '
git reset --hard &&
cat <<-\EOF >test.c &&
#include<stdio.h>
main()
{
printf("Hello World");
}
EOF
git add test.c &&
git commit -m "add main function" &&
git mv test.c main.c &&
test_config color.diff.oldMoved "normal red" &&
test_config color.diff.newMoved "normal green" &&
git diff HEAD --color-moved --no-renames | test_decode_color >actual &&
cat >expected <<-\EOF &&
<BOLD>diff --git a/main.c b/main.c<RESET>
<BOLD>new file mode 100644<RESET>
<BOLD>index 0000000..a986c57<RESET>
<BOLD>--- /dev/null<RESET>
<BOLD>+++ b/main.c<RESET>
<CYAN>@@ -0,0 +1,5 @@<RESET>
<BGREEN>+<RESET><BGREEN>#include<stdio.h><RESET>
<BGREEN>+<RESET><BGREEN>main()<RESET>
<BGREEN>+<RESET><BGREEN>{<RESET>
<BGREEN>+<RESET><BGREEN>printf("Hello World");<RESET>
<BGREEN>+<RESET><BGREEN>}<RESET>
<BOLD>diff --git a/test.c b/test.c<RESET>
<BOLD>deleted file mode 100644<RESET>
<BOLD>index a986c57..0000000<RESET>
<BOLD>--- a/test.c<RESET>
<BOLD>+++ /dev/null<RESET>
<CYAN>@@ -1,5 +0,0 @@<RESET>
<BRED>-#include<stdio.h><RESET>
<BRED>-main()<RESET>
<BRED>-{<RESET>
<BRED>-printf("Hello World");<RESET>
<BRED>-}<RESET>
EOF
test_cmp expected actual
'
test_expect_success 'detect malicious moved code, inside file' '
test_config color.diff.oldMoved "normal red" &&
test_config color.diff.newMoved "normal green" &&
test_config color.diff.oldMovedAlternative "blue" &&
test_config color.diff.newMovedAlternative "yellow" &&
git reset --hard &&
cat <<-\EOF >main.c &&
#include<stdio.h>
int stuff()
{
printf("Hello ");
printf("World\n");
}
int secure_foo(struct user *u)
{
if (!u->is_allowed_foo)
return;
foo(u);
}
int main()
{
foo();
}
EOF
cat <<-\EOF >test.c &&
#include<stdio.h>
int bar()
{
printf("Hello World, but different\n");
}
int another_function()
{
bar();
}
EOF
git add main.c test.c &&
git commit -m "add main and test file" &&
cat <<-\EOF >main.c &&
#include<stdio.h>
int stuff()
{
printf("Hello ");
printf("World\n");
}
int main()
{
foo();
}
EOF
cat <<-\EOF >test.c &&
#include<stdio.h>
int bar()
{
printf("Hello World, but different\n");
}
int secure_foo(struct user *u)
{
foo(u);
if (!u->is_allowed_foo)
return;
}
int another_function()
{
bar();
}
EOF
git diff HEAD --no-renames --color-moved=zebra| test_decode_color >actual &&
cat <<-\EOF >expected &&
<BOLD>diff --git a/main.c b/main.c<RESET>
<BOLD>index 27a619c..7cf9336 100644<RESET>
<BOLD>--- a/main.c<RESET>
<BOLD>+++ b/main.c<RESET>
<CYAN>@@ -5,13 +5,6 @@<RESET> <RESET>printf("Hello ");<RESET>
printf("World\n");<RESET>
}<RESET>
<RESET>
<BRED>-int secure_foo(struct user *u)<RESET>
<BRED>-{<RESET>
<BLUE>-if (!u->is_allowed_foo)<RESET>
<BLUE>-return;<RESET>
<BRED>-foo(u);<RESET>
<BLUE>-}<RESET>
<BLUE>-<RESET>
int main()<RESET>
{<RESET>
foo();<RESET>
<BOLD>diff --git a/test.c b/test.c<RESET>
<BOLD>index 1dc1d85..2bedec9 100644<RESET>
<BOLD>--- a/test.c<RESET>
<BOLD>+++ b/test.c<RESET>
<CYAN>@@ -4,6 +4,13 @@<RESET> <RESET>int bar()<RESET>
printf("Hello World, but different\n");<RESET>
}<RESET>
<RESET>
<BGREEN>+<RESET><BGREEN>int secure_foo(struct user *u)<RESET>
<BGREEN>+<RESET><BGREEN>{<RESET>
<YELLOW>+<RESET><YELLOW>foo(u);<RESET>
<BGREEN>+<RESET><BGREEN>if (!u->is_allowed_foo)<RESET>
<BGREEN>+<RESET><BGREEN>return;<RESET>
<YELLOW>+<RESET><YELLOW>}<RESET>
<YELLOW>+<RESET>
int another_function()<RESET>
{<RESET>
bar();<RESET>
EOF
test_cmp expected actual
'
test_expect_success 'no effect from --color-moved with --word-diff' '
cat <<-\EOF >text.txt &&
Lorem Ipsum is simply dummy text of the printing and typesetting industry.
EOF
git add text.txt &&
git commit -a -m "clean state" &&
cat <<-\EOF >text.txt &&
simply Lorem Ipsum dummy is text of the typesetting and printing industry.
EOF
git diff --color-moved --word-diff >actual &&
git diff --word-diff >expect &&
test_cmp expect actual
'
test_expect_success 'move detection ignoring whitespace ' '
git reset --hard &&
cat <<\EOF >lines.txt &&
line 1
line 2
line 3
line 4
line 5
line 6
line 7
EOF
git add lines.txt &&
git commit -m "add poetry" &&
cat <<\EOF >lines.txt &&
line 5
line 6
line 7
line 1
line 2
line 3
line 4
EOF
test_config color.diff.oldMoved "magenta" &&
test_config color.diff.newMoved "cyan" &&
git diff HEAD --no-renames --color-moved| test_decode_color >actual &&
cat <<-\EOF >expected &&
<BOLD>diff --git a/lines.txt b/lines.txt<RESET>
<BOLD>index 734156d..eb89ead 100644<RESET>
<BOLD>--- a/lines.txt<RESET>
<BOLD>+++ b/lines.txt<RESET>
<CYAN>@@ -1,7 +1,7 @@<RESET>
<GREEN>+<RESET> <GREEN>line 5<RESET>
<GREEN>+<RESET> <GREEN>line 6<RESET>
<GREEN>+<RESET> <GREEN>line 7<RESET>
line 1<RESET>
line 2<RESET>
line 3<RESET>
line 4<RESET>
<RED>-line 5<RESET>
<RED>-line 6<RESET>
<RED>-line 7<RESET>
EOF
test_cmp expected actual &&
git diff HEAD --no-renames -w --color-moved| test_decode_color >actual &&
cat <<-\EOF >expected &&
<BOLD>diff --git a/lines.txt b/lines.txt<RESET>
<BOLD>index 734156d..eb89ead 100644<RESET>
<BOLD>--- a/lines.txt<RESET>
<BOLD>+++ b/lines.txt<RESET>
<CYAN>@@ -1,7 +1,7 @@<RESET>
<CYAN>+<RESET> <CYAN>line 5<RESET>
<CYAN>+<RESET> <CYAN>line 6<RESET>
<CYAN>+<RESET> <CYAN>line 7<RESET>
line 1<RESET>
line 2<RESET>
line 3<RESET>
line 4<RESET>
<MAGENTA>-line 5<RESET>
<MAGENTA>-line 6<RESET>
<MAGENTA>-line 7<RESET>
EOF
test_cmp expected actual
'
test_expect_success 'move detection with submodules' '
test_create_repo bananas &&
echo ripe >bananas/recipe &&
git -C bananas add recipe &&
test_commit fruit &&
test_commit -C bananas recipe &&
git submodule add ./bananas &&
git add bananas &&
git commit -a -m "bananas are like a heavy library?" &&
echo foul >bananas/recipe &&
echo ripe >fruit.t &&
git diff --submodule=diff --color-moved >actual &&
# no move detection as the moved line is across repository boundaries.
test_decode_color <actual >decoded_actual &&
! grep BGREEN decoded_actual &&
! grep BRED decoded_actual &&
# nor did we mess with it another way
git diff --submodule=diff | test_decode_color >expect &&
test_cmp expect decoded_actual
'
test_done