Differences between revisions 5 and 6
Revision 5 as of 2010-08-30 09:27:46
Size: 7877
Editor: MiriamRuiz
Comment:
Revision 6 as of 2010-10-28 06:19:40
Size: 7953
Editor: ?skizzhg
Comment: dropping English/* prefix to fit wiki.d.o convention (see DebianWomen#migrationTODO)
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
## page was renamed from English/Courses/MaintainingPackages/Intro/Patches

Theory

Patches are just files containing a minimal representation of differences between one or more files, and they're often used for handling changes in the source of the programs. Two popular programs are used to work with patches: diff and patch, and patches can be stored in different formats: 'normal' diff output, Copied Context Diff, or the most standard format: Unified Context Diff. Copied Context format can be generated by adding the switch "-c" to the diff command, and Unified Context can be obtained with "-u". Patches are usually stored with the extension ".diff" or ".patch"

The most basic tool to create patches is the command diff, that compares files line by line and publishes the differences. The most common switches you'll use with diff are:

  • -r or --recursive : Recursively compare any subdirectories found.
  • -u -U NUM --unified[=NUM] : Output NUM (default 3) lines of unified context.
  • -N --new-file : Treat absent files as empty.
  • -a --text : Treat all files as text.

When you have a file generated by diff (also called a patch) you can always apply it to the original file, or remove it from the final file (even if they have been modified, as long as the changes in the patch do not conflict with the other changes. If they both modify lines that are close together, or the same line, then they will conflict, and you may have problems applying the patch).

The most usual syntax for applying a patch is: patch -pnum <patchfile. If you know a bit about shell scripting, the symbol "<" means thet the contents of the file "patchfile" are streamed to the program through the standard input. If you don't, don't worry too much about it. A common mistake is to forget the "<", in which case the command will sit and wait for some input.

The switch -p num (or --strip=num) means that the program must strip the smallest prefix containing num leading slashes from each file name found in the patch file. This controls how file names found in the patch file are treated, in case you keep your files in a different directory than the person who sent out the patch. A sequence of one or more adjacent slashes is counted as a single slash, and a name is ignored if it does not have enough slashes to satisfy the -pnum or --strip=num option.

Exercises

Create a file named "a" that contains the numbers 1 to 10 on separate lines (type ctrl-d to end the input):

  $ cat >a 
  1
  2
  3
  4
  5
  6
  7
  8
  9
  10

Now copy the file to the file "b", and then edit it to delete the "2", add "9.5" and change "10" to "about 10", so that it looks like:

  1
  3
  4
  5
  6
  7
  8
  9
  9.5
  about 10

Now we can see what the diff of these two files look like. The most common type of diff to use is the unified context diff, which you get by passing -u to the diff command.

  $ diff -u a b
  --- a 2007-08-31 23:41:02.000000000 +0100
  +++ b 2007-09-01 00:17:08.000000000 +0100
  @@ -1,5 +1,4 @@
   1
  -2
   3
   4
   5
  @@ -7,4 +6,5 @@
   7
   8
   9
  -10
  +9.5
  +about 10

You can see that it has a two line header, with the names of the two files, and their timestamps. The next line is called a "hunk header". A hunk is a piece of the diff that deals with changes to one part of the file, we have two hunks in this diff, as we made the changes sufficiently far apart that the diff program considered them to be separate. The hunk header contains information about the line numbers in the two files, which is used when you want to apply these changes elsewhere. Below this header is the representation of the change. Each line is indented by one character. If this character is a space (like in front of the "1") then it means that this line was unchanged between the files. If the character is a "-" then it means that the line was present in the first file, but not the second, i.e. it was deleted. A "+" character in the first column indicates that the line was present in the second file, but not the first, i.e. it was added. You can see this at the bottom where 9.5 was added. Finally you can see that it represented the change from "10" to "about 10" by deleting the old line and inserting the new one.

Now you can see the other types of diff. I will not go in to details of these types, as they are rarely used. First the copied context diff, using the -c flag:

  $ diff -c a b
  *** a 2007-08-31 23:41:02.000000000 +0100
  --- b 2007-09-01 00:17:08.000000000 +0100
  ***************
  *** 1,5 ****
    1
  - 2
    3
    4
    5
  --- 1,4 ----
  ***************
  *** 7,10 ****
    7
    8
    9
  ! 10
  --- 6,10 ----
    7
    8
    9
  ! 9.5
  ! about 10

and then the 'normal' type, using no flags:

  $ diff a b
  2d1
  < 2
  10c9,10
  < 10
  ---
  > 9.5
  > about 10

Now we will look at how to apply a diff. First we need to save the output of the diff command so that we can use it as input to the patch command.

  $ diff -u a b > change.diff

Now we use the patch command to apply these changes to the "a" file.

  $ patch < change.diff
  patching file a

If you now look at the contents of "a" you will see that they are the same as "b". We can again use diff to check this

  $ diff -u a b

which produces no output, indicating that the files are identical.

Now we are going to remove the patch from "a" to return it to its original state. To do this we again use patch, with the same diff as before, but pass the "-R" flag, which indicates to reverse the patch.

  $ patch -R < change.diff
  patching file a

You can check that now "a" just contains the numbers from "1" to "10" again.

Edit the "a" file, and change the first line to say "starting from 1" instead of "1". Now have a look at the diff between "a" and "b":

  $ diff -u a b
  --- a 2007-09-01 00:38:43.000000000 +0100
  +++ b 2007-09-01 00:17:08.000000000 +0100
  @@ -1,5 +1,4 @@
  -starting at 1
  -2
  +1
   3
   4
   5
  @@ -7,4 +6,5 @@
   7
   8
   9
  -10
  +9.5
  +about 10

You can see that this is different to the diff that we have saved in the "change.diff" file. We will now try and apply the original patch again, to see what happens when we have made changes that are not shown in the diff.

  $ patch < change.diff
  patching file a
  Hunk #1 FAILED at 1.
  1 out of 2 hunks FAILED -- saving rejects to file a.rej

The "patch" program did its best to apply the changes, and managed to apply the second hunk, but we had edited a line that is in the first hunk, so it wasn't sure how to apply it. It tells us this by saying that the hunk failed. It creates a file, here named "a.rej" that contains the hunk that it was trying to apply. If you open up the file "a", you will see that "2" is still there, so it didn't apply the hunk. If you open the "a.rej" file then you can see that it contains the copied context format diff of the hunk that it couldn't apply. You will also find the file a.orig, which contains the original contents of "a", so that you can undo your changes if you want. Most of the time though you will resolve this "conflict", by editing the file and making the changes by hand. This time we would like to have both the changes represented in the file "a", and so we can simply edit the file and delete the line containing "2". Most of the time the changes that are needed to solve a conflict are more complex, and may involve other changes. Indeed even if the patch applies with no conflicts you may still need to make some changes yourself.

As a final step you can look at the diff between "a" and "b" to see what the differences are now. You should see that it simply shows the change of the first line as you would expect.