ftwin 0.8.1
Usage: ./ftwin [OPTION]... [FILES or DIRECTORIES]...
Find identical files passed as parameter or recursively found in directories.
Mandatory arguments to long options are mandatory for short options too.
-c, --case-unsensitive this option applies to regex match.
-d, --display-size display size before duplicates.
-e, --regex-ignore-file filenames that match this are ignored.
-f, --follow-symlink follow symbolic links.
-h, --help display usage.
-I, --image-cmp will run ftwin in image cmp mode (using libpuzzle).
-i, --ignore-list comma-separated list of file names to ignore.
-m, --minimal-length minimum size of file to process.
-o, --optimize-memory reduce memory usage, but increase process time.
-p, --priority-path file in this path are displayed first when
duplicates are reported.
-r, --recurse-subdir recurse subdirectories.
-s, --separator separator character between twins, default: \n.
-t, --tar-cmp will process files archived in .tar default: off.
-v, --verbose display a progress bar.
-V, --version display version.
-w, --whitelist-regex-file filenames that doesn't match this are ignored.
-x, --excessive-size excessive size of file that switch off mmap use.
Introduction
ftwin is a tool useful to find duplicate files according to their content on your file system. You may want to read the related blog entries to understand why this tool has been developed, and how it works.
Download, compilation and installation
You have to download latest version here.
Then, a classical ./configure && make && make install should work fine on any linux system, this tool depends on APR so it may work on Windows system but I don't have a clue how to ;).
Read Help
Download CHANGES
Read CHANGES
0.8.1:
- bugfix-major: - compilation failed on latest ubuntu.
0.8.0:
- feature-minor: - use a BSD-compatible function from libc in order to
obtain all group permissions of the process.
This may allow Baptiste Daroussin to build a BSD port of
this tool (thanks a lot to him !)
0.7.2:
- bugfix-major: - compilation fail on some architecture because I used APR
primitive instead of off_t size_t with archive_*
functions.
(thanks to Lou Afonso for bug report).
0.7.1:
- bugfix-minor: - Fix the size of integer in the hash unit test.
(thanks to Juan M. Bello Rivas for the patch).
0.7.0:
- feature: - Move from libtar to libarchive. If zlib and libbz2 are
present, the corresponding types of archives may be looked
into for duplicate searching.
0.6.0:
- feature: - Add a -t option to add files content of .tar archives to the
process of duplicate searching, if zlib is present, .tar.gz
are automatically processed too.
- cosmetic: - Correctly build manpage using automake.
- legal: - LICENSE file added.
0.5.1:
- bugfix: - permissions are now processed correctly when ftwin collect
files and path informations.
- cosmetic: - Add example file, and manpage.
0.5.0:
- feature: - Add a -I option that switch ftwin into duplicate image
finding, in this mode, ftwin search for image that are copy of
each other, even if they are resized, using libpuzzle.
0.4.0:
- feature: - Add a whitelisting option, you may want to select files whose
name match a particular regexp for comparison.
For example the following line will report duplicate files
whose extension is .txt and that are not in a .svn directory:
./ftwin -e ".*/\.svn/.*" -w ".*\.txt$" -v -r ${HOME}
- bugfix: - mmap checksum mmap is done by block of the same size than
cheksum by read.
The value of checksums were different, now it is the same if
two files are on different type of fs (one mmap-capable, the
other not).
- bugfix-minor: - Fix a minor memleak by attaching pcre * to an apr_pool.
0.3.1:
- bugfix: - if a file disappear between the collecting phase and the
comparing phase, it will just display skipping message.
That may happen when cleaning a cache during ftwin session.
- bugfix: - if a file is on a device (typically /sys/ or /proc/) that does
not allow mmaping, switch to a more standard read mode.
- bugfix: - Add support for apr-1-config and apu-1-config on mandriva arch
in acinclude.m4. (reported by Lou Afonso)
0.3.0:
- bugfix: - if a file did not have correct permissions (read bit for
user/group/world) to be read/checksumed, it should have crashed
ftwin, now it silently (unless verbose mode) skip the file
instead. Same patch for dir and execute bit.
- feature: - add the -p implementation, files in a path preppended
by its (-p) param will be displayed first (example :
-p /home/joke/ will display /home/joke/dup before /etc/dup),
this may be useful to script a deletion for example.
- bugfix: - big file may overuse memory when using mmap syscall where it is
implemented, so for files whose size > excess (defined by
parameter -x, default 50Mo), use a standard chunk read method.
0.2.0:
- legal: - copyrights added.
- feature: - output reordered by size, add an option to display them.
0.1.0:
- import: Initial version of ftwin, with basic command line options.
- bugfix-major: - compilation failed on latest ubuntu.
0.8.0:
- feature-minor: - use a BSD-compatible function from libc in order to
obtain all group permissions of the process.
This may allow Baptiste Daroussin to build a BSD port of
this tool (thanks a lot to him !)
0.7.2:
- bugfix-major: - compilation fail on some architecture because I used APR
primitive instead of off_t size_t with archive_*
functions.
(thanks to Lou Afonso for bug report).
0.7.1:
- bugfix-minor: - Fix the size of integer in the hash unit test.
(thanks to Juan M. Bello Rivas for the patch).
0.7.0:
- feature: - Move from libtar to libarchive. If zlib and libbz2 are
present, the corresponding types of archives may be looked
into for duplicate searching.
0.6.0:
- feature: - Add a -t option to add files content of .tar archives to the
process of duplicate searching, if zlib is present, .tar.gz
are automatically processed too.
- cosmetic: - Correctly build manpage using automake.
- legal: - LICENSE file added.
0.5.1:
- bugfix: - permissions are now processed correctly when ftwin collect
files and path informations.
- cosmetic: - Add example file, and manpage.
0.5.0:
- feature: - Add a -I option that switch ftwin into duplicate image
finding, in this mode, ftwin search for image that are copy of
each other, even if they are resized, using libpuzzle.
0.4.0:
- feature: - Add a whitelisting option, you may want to select files whose
name match a particular regexp for comparison.
For example the following line will report duplicate files
whose extension is .txt and that are not in a .svn directory:
./ftwin -e ".*/\.svn/.*" -w ".*\.txt$" -v -r ${HOME}
- bugfix: - mmap checksum mmap is done by block of the same size than
cheksum by read.
The value of checksums were different, now it is the same if
two files are on different type of fs (one mmap-capable, the
other not).
- bugfix-minor: - Fix a minor memleak by attaching pcre * to an apr_pool.
0.3.1:
- bugfix: - if a file disappear between the collecting phase and the
comparing phase, it will just display skipping message.
That may happen when cleaning a cache during ftwin session.
- bugfix: - if a file is on a device (typically /sys/ or /proc/) that does
not allow mmaping, switch to a more standard read mode.
- bugfix: - Add support for apr-1-config and apu-1-config on mandriva arch
in acinclude.m4. (reported by Lou Afonso)
0.3.0:
- bugfix: - if a file did not have correct permissions (read bit for
user/group/world) to be read/checksumed, it should have crashed
ftwin, now it silently (unless verbose mode) skip the file
instead. Same patch for dir and execute bit.
- feature: - add the -p implementation, files in a path preppended
by its (-p) param will be displayed first (example :
-p /home/joke/ will display /home/joke/dup before /etc/dup),
this may be useful to script a deletion for example.
- bugfix: - big file may overuse memory when using mmap syscall where it is
implemented, so for files whose size > excess (defined by
parameter -x, default 50Mo), use a standard chunk read method.
0.2.0:
- legal: - copyrights added.
- feature: - output reordered by size, add an option to display them.
0.1.0:
- import: Initial version of ftwin, with basic command line options.
Read TODO
- implement cli options:
1. c case-unsensitive applied to -i. (ignore-list (comma-separated list of
files) apply to -i, switch from hash to array+strcasecmp.)
2. o optimize-memory : not implemented.
- Add a file to make an exclusion list (.svn CVS etc...).
- use mime-magic to get content type to allow comparison for one type only.
- zlib, lib unzip, lib unrar
- Report whole directory/subdir/files equality, i.e.
/home/joke/tar/httpd-2.0.59/ /tmp/httpd-2.0.59/
Instead of all subfiles/dir of each of these.
1. c case-unsensitive applied to -i. (ignore-list (comma-separated list of
files) apply to -i, switch from hash to array+strcasecmp.)
2. o optimize-memory : not implemented.
- Add a file to make an exclusion list (.svn CVS etc...).
- use mime-magic to get content type to allow comparison for one type only.
- zlib, lib unzip, lib unrar
- Report whole directory/subdir/files equality, i.e.
/home/joke/tar/httpd-2.0.59/ /tmp/httpd-2.0.59/
Instead of all subfiles/dir of each of these.