Re: CRLF fun stuff again...


Subject: Re: CRLF fun stuff again...
From: Cliff Crawford (cjc26@cornell.edu)
Date: Sat Feb 10 2001 - 08:35:44 EST


* Duncan Sinclair <sinclair@dis.strath.ac.uk> menulis:
> >> But the transformation is reversible. the conversion is as follows:
> >>
> >> CR -> LF
> >> LF -> CR
> >>
> >> You do the transformation twice and you get back an identical file.
> >
> >Not true with all files.
>
> Yes true for all files. No matter what the 8-bit values "13" and "10"
> mean, you can do this swap twice and you'll get back the exact same
> file.

That's NOT true. Suppose you had a file with the following data in it:

43 0A A2 7B 0D DD 38 0A 8E 0D 43 2F
   ^^ ^^

There are two LFs already in the file (I underlined them above). Now do
CR->LF conversion:

43 0A A2 7B 0A DD 38 0A 8E 0A 43 2F

And now try to convert back:

43 0D A2 7B 0D DD 38 0D 8E 0D 43 2F
   ^^ ^^

The two LFs that were originally in the file have been converted into
CRs. There is NO way to tell that they were originally LFs. The file
is now irreparably corrupted.

> Here's a test with a binary jpeg file....
>
> (Using GNU tr - Sun's "tr" doesn't cope with binary files - if you
> repeat this test make sure you use one that does.)
>
> quartz:~% sum duncan.jpg
> 32290 17 duncan.jpg
>
> Original file's checksum is 32290
>
> quartz:~% tr '\r\n' '\n\r' < duncan.jpg > foo.jpg
> quartz:~% sum foo.jpg
> 13378 17 foo.jpg
>
> Transform once, checksum is 13378
>
> quartz:~% tr '\r\n' '\n\r' < foo.jpg > bar.jpg
> quartz:~% sum bar.jpg
> 32290 17 bar.jpg
>
> Transform a second time, checksum is back to 32290
>
> Convinced yet???

No. You must have got lucky and used a file which didn't have 0x0A in
it.

-- 
Cliff Crawford               http://www.people.cornell.edu/pages/cjc26/
                             print "Just another Python hacker"



This archive was generated by hypermail 2b28 : Sun Oct 14 2001 - 03:04:32 EDT