Home > ftwin : find twin files on your fs. >

Introduction

ftwin is a tool useful to find duplicate files according to their content on your file system. You may want to read the related blog entries to understand why this tool has been developed, and how it works.

Download, compilation and installation

You have to download latest version here.
Then, a classical ./configure && make && make install should work fine on any linux system, this tool depends on APR so it may work on Windows system but I don't have a clue how to ;).

Read Help

ftwin 0.8.5
Usage: ./ftwin [OPTION]... [FILES or DIRECTORIES]...
Find identical files passed as parameter or recursively found in directories.

Mandatory arguments to long options are mandatory for short options too.

-c,     --case-unsensitive      this option applies to regex match.
-d,     --display-size          display size before duplicates.
-e,     --regex-ignore-file     filenames that match this are ignored.
-f,     --follow-symlink        follow symbolic links.
-h,     --help                  display usage.
-I,     --image-cmp             will run ftwin in image cmp mode (using libpuzzle).
-i,     --ignore-list           comma-separated list of file names to ignore.
-m,     --minimal-length        minimum size of file to process.
-o,     --optimize-memory       reduce memory usage, but increase process time.
-p,     --priority-path         file in this path are displayed first when
                                duplicates are reported.
-r,     --recurse-subdir        recurse subdirectories.
-s,     --separator             separator character between twins, default: \n.
-t,     --tar-cmp               will process files archived in .tar default: off.
-v,     --verbose               display a progress bar.
-V,     --version               display version.
-w,     --whitelist-regex-file  filenames that doesn't match this are ignored.
-x,     --excessive-size        excessive size of file that switch off mmap use.
 

Download CHANGES

CHANGES

Read CHANGES

0.8.5:
    - bugfix-major: - File ignore list had no effect.
                      (thanks to Kuat Eshengazin for bug report AND patch !).

0.8.4:
    - bugfix-major: - don't crash when an interface is closed (for example,
                      some files in /sys/class/net/eth0/ dormant and carrier
                      might be read as INVALID).
                      (thanks to Kuat Eshengazin for bug report AND patch !).

0.8.3:
    - bugfix-major: - don't infinite loop on symlink follow.
                      (thanks to Kuat Eshengazin for bug report AND patch !).

0.8.2:
    - bugfix-minor: - don't stop if a broken link is seen, just report it.
                      (thanks to Imad Soltani for bug report).

0.8.1:
    - bugfix-major: - compilation failed on latest ubuntu.

0.8.0:
    - feature-minor: - use a BSD-compatible function from libc in order to
                       obtain all group permissions of the process.
                       This may allow Baptiste Daroussin to build a BSD port of
                       this tool (thanks a lot to him !)

0.7.2:
    - bugfix-major: - compilation fail on some architecture because I used APR
                      primitive instead of off_t size_t with archive_*
                      functions.
                      (thanks to Lou Afonso for bug report).

0.7.1:
    - bugfix-minor: - Fix the size of integer in the hash unit test.
                      (thanks to Juan M. Bello Rivas for the patch).

0.7.0:
    - feature: - Move from libtar to libarchive. If zlib and libbz2 are
                 present, the corresponding types of archives may be looked
                 into for duplicate searching.

0.6.0:
    - feature: - Add a -t option to add files content of .tar archives to the
                 process of duplicate searching, if zlib is present, .tar.gz
                 are automatically processed too.

    - cosmetic: - Correctly build manpage using automake.

    - legal: - LICENSE file added.

0.5.1:
    - bugfix: - permissions are now processed correctly when ftwin collect
                files and path informations.

    - cosmetic: - Add example file, and manpage.

0.5.0:
    - feature: - Add a -I option that switch ftwin into duplicate image
                 finding, in this mode, ftwin search for image that are copy of
                 each other, even if they are resized, using libpuzzle.

0.4.0:
    - feature: - Add a whitelisting option, you may want to select files whose
                 name match a particular regexp for comparison.
                 For example the following line will report duplicate files
                 whose extension is .txt and that are not in a .svn directory:
                 ./ftwin -e ".*/\.svn/.*" -w ".*\.txt$" -v -r ${HOME}

    - bugfix: - mmap checksum mmap is done by block of the same size than
                cheksum by read.
                The value of checksums were different, now it is the same if
                two files are on different type of fs (one mmap-capable, the
                other not).

    - bugfix-minor: - Fix a minor memleak by attaching pcre * to an apr_pool.

0.3.1:
    - bugfix: - if a file disappear between the collecting phase and the
                comparing phase, it will just display skipping message.
                That may happen when cleaning a cache during ftwin session.

    - bugfix: - if a file is on a device (typically /sys/ or /proc/) that does
                not allow mmaping, switch to a more standard read mode.

    - bugfix: - Add support for apr-1-config and apu-1-config on mandriva arch
                in acinclude.m4. (reported by Lou Afonso)

0.3.0:
    - bugfix: - if a file did not have correct permissions (read bit for
                user/group/world) to be read/checksumed, it should have crashed
                ftwin, now it silently (unless verbose mode) skip the file
                instead. Same patch for dir and execute bit.
    - feature: - add the -p implementation, files in a path preppended
                 by its (-p) param will be displayed first (example :
                 -p /home/joke/ will display /home/joke/dup before /etc/dup),
                 this may be useful to script a deletion for example.
    - bugfix: - big file may overuse memory when using mmap syscall where it is
                implemented, so for files whose size > excess (defined by
                parameter -x, default 50Mo), use a standard chunk read method.

0.2.0:
    - legal: - copyrights added.
    - feature: - output reordered by size, add an option to display them.

0.1.0:
    - import: Initial version of ftwin, with basic command line options.
 

Read TODO

- implement cli options:
    1. c case-unsensitive applied to -i. (ignore-list (comma-separated list of
       files) apply to -i, switch from hash to array+strcasecmp.)
    2. o optimize-memory : not implemented.

- Add a file to make an exclusion list (.svn CVS etc...).

- use mime-magic to get content type to allow comparison for one type only.

- zlib, lib unzip, lib unrar

- Report whole directory/subdir/files equality, i.e.
  /home/joke/tar/httpd-2.0.59/ /tmp/httpd-2.0.59/
  Instead of all subfiles/dir of each of these.