Swish-e portable

Swish-e is an indexing machine, very fast and friendly, I like it much.

It is a long time, I decided to make a digital library (not a real one, but something as a personal collection). It was not made only for me, but also for my university colleagues. It consists of Xitami webserver, Swish-e, Perl and some other programs. It is a quite independent application, made for Windows; can both index and search your local files (you need virtually nothing to have them indexed), and more, it is portable - you may carry your library on CD with you (the one that I have in my buggage contains 500 MB of texts).

You may download the files from the section below. Simply unzip, run start.exe, and see what it dows. If you want to understand it closer, and I guess that someone might want it, there comes a detailed description of the Swish-eX parts:

Installation

Swish-eX without filters is in the sx-base.zip file, it contains Xitami Lite v2.4, Swish-e v2.3, and Autoswish. Everything is preconfigured.

If you want filters for pdf, doc, rtf as well, download sx-filters.zip and place the whole unzipped directory inside swish-e dir of existing Swish-eX installation. Calling filters from AutoSwish is possible with these instructions: (in a configuration file, see /swish-e/conf/common.config)

FileFilter .pdf '../perl/bin/perl ./filter/pdf2html.pl' '"%P"'
FileFilter .doc '../perl/bin/perl ./filter/antiword.pl' '"%P"'
FileFilter .rtf '../perl/bin/perl ./filter/rtf2htm.pl' '"%P"'

AutoSwish is inside autoswish.zip and I believe it is easier to configure than to original script
Place it in cgi-bin directory of your webserver and edit config.pl

Swish-e search engine and its container Xitami


Swish-e is an open-source program. You can download it from http://swish-e.org. There are many ports of it, we use window's one - here, in Swish-eX, there is version 2.3 If you want the newest, dowload it and place in the swish-e directory - only *.exe and *.dll are necessary.

If you have no experience with swish-e, Autoswish might help you start, but I would recommend reading documentation and developing your own configurations - you will see that the acronym of Swish-e is truthful.

Note. If you intend to use Swish-e for searching indexes on CD, it is generally better to set the minimum length of an word higher then default (The default is 1). e.g.

MinWordLimit 3


Xitami webserver is published under General Public Licence too. You may get it from http://ww.xitami.com - read its documentation, you will find that some features of Xitami were disabled here.

Xitami is started via x.bat - so you can add you own instructions there.

Perl - Perl, again, is open source (you may get it e.g. from http://www.perl.com. Only a small and core part of it is provided with Swish-eX - necessary modules for searching and running of filters. But if something doesn't work as expected, please, consult cgierr.log. I have little experience with the portability of Perl, but till this time, Swish-eX worked everywhere; yet it is probable that if you want to add some new features and need some another perl files, you will need to install Perl and takes everything necessary from there, not just the missing modules.

Autoswish - web interface for Swish-e

Converting filters for indexing pdf, doc, rtf


Swish-e uses filters to index non-text files. Here you can find programs for three most common types
  • Antiword - for MSWord documents, I am very content with it - however, you may want to change the output encoding - look at antiword.pl
  • Pdftotext - excellent for pdf
  • PHP and rtftohtml - yes, you are reading right. I am using PHP, because I could not find any filter that can process RTF documents in other encoding than iso-8859-1. If you know about one, please, spread a word.

Output will be in ISO 8859-2 encoding, if your documents need something else, change the configuration of each program.

Swish-eX - is responsible for initial configuration of Xitami, Swish-e cgi scripts and makes possible searching from CD

swish-ex
  • When you download the package, unzip it and click on start.exe this will happen:
    You may choose from the two options - either start, or install
      start
      1. If the working directory is writable, Swish-eX will assume that everything is ready and nothing needs to be changed. It will start the program called xstart2.exe - this one will start x.bat, finds out the address and the port where Xitami runs and after that, it will open your browser and point it to http://adress:port/default.htm.

        Therefore, in order to have working collection you must adher to some rules - the first is : use relative paths. If you set everything correctly and your application is running on your HDD with relative paths, it should be running elsewhere, no matter what disk is used. And that is why Swish-eX will not change anything at the startup (Look inside cgi-bin directory at the first line of cgi scripts and also at the configuration files in the passw directory for examples.

      2. But, if the working directory is not writable (e.g. running from CD-ROM) things go differently, because Xitami needs write access for running cgi applications.

        Swish-eX will copy some files to the $TMP (Windows Temporary directory) and start Xitami from there. On some old computers, this operation may takes more than 5 s., but usually it is less than 3 s. These files and directories are copied:

        			/xstart.exe
        			/your.cfg   (detected from xitami.cfg)
        			/x.bat
        			/xitami.exe
        			/cgi-dir/*  (*.*)
        			/perl/      (only selected files, 1.7 MB)
        			/swish-e/   (*.dll and *.exe, 1.4 MB)
        			/passw/     (*.*)
        		
        After the xitami started, program will wait for next 2.5 seconds, then read swish-ex.log, detects an address and a port and finally open browser with http://address:port/default.htm.

        But it was not so straightforward - before it, program found every *.conf in passw and cgi-bin directories and changed relative paths to absolute paths. (This is necessary, because we are running swish-e from hardisk, but, except the files mentioned above, everything remained at the old location.)

        If you had Swish-eX on CD, inside the directory myLibrary (disk D:) and you run start.exe from there, your working directory was "D:\myLibrary". Conf files was changed in this way:

         indexes => "../index/myDoc.swish-e", "../../myFirm/theirIndex.swish-e", 
        goes to
         indexes => "D:/MYLIBRARY/index/myDoc.swish-e", "D:/myFirm/theirIndex.swish-e", 
        Note, that the relative path is relative to the location of your .cgi scripts (D:\myLibrary\cgi-bin. Note also that swish-e uses forward slashes, you should use it too; and next, swish-e binary can not be run from CD, you must avoid changing swish_binary directive. Use single quotation marks instead of double ones
        swish_binary    => '../swish-e/swish-e',
        And the last warning - shebang, the first line in cgi files, is relative to the location from where xitami started, therefore, it is not the D:\myLibrary\cgi-bin but D:\myLibrary.

        After this, everything is set for cgi - when you click search form from browser, Xitami starts cgi with perl (.\perl\bin\perl), perl will find .conf files with right directives; and when you search, perl will start swish-e from hardisk but with indexes being pointed to the old location (D:\myLibrary\index\... and D:\myFirm\...).

        Finally, I must mention the configuration file for Xitami (usualy xitami.cfg but detected at a runtime). You may play with Xitami as you like, but some directives are "forbidden" (of course, do what you want as long as you know... :)

        You must not change :

        [Server]
        webpages=webpages 
        
        [Console]
        capture=1                      #capture console output
        filename=$(TEMP)\swish-ex.log  #to this file
        
        And you should be very careful and test the changes made in:
        [Server]
            cgi-bin=cgi-bin                     #   Relative or full path, or '*'
        
        [CGI]
            enabled=1                           #   CGI programs enabled?
            workdir=-                           #   Where CGI scripts run
            msdos-style=0                       #   Use backslash in pathnames
        
        Before x.bat starts, I will do some changes in cfg file in order for Xitami to find necessary files
        [Server]
        webpages=D:\MYLIBRARY\webpages 
        [Security]
        filename=""
        
        These directives are necessary for Xitami to run (setting this, webserver will run even if you have no cfg file there).

        Uff, I might have forgotten something, but don't care. If you set everything in a way that it will work on your hardisk (use relative paths), then simply make a test - run start.exe with argument -mode cd

        start.exe -mode cd
        and watch what will happen (if something is wrong, look inside $TEMP\swish-ex\ - usually "C:\windows\temp\swish-ex" or "C:\windows32\temp\swish-ex\"
      Install
      1. If you want to install Swish-eX onto disk, you will choose "Install" option in the welcome screen. Program will ask what you want to do, and where - that is all. Compared to previous mode, this is a children game. Swish-eX only copies things from one place to another (if you want to copy only the indexing machine and keep documents on the cd, it only changes one directive in cfg for xitami).
        webpages=webpages
        will be changed to
        webpages=D:\myLibrary\webpages
        Searching will be faster, but you need to have cd in the drive to view the full-texts. If you want to do more, I would recommend you downloading NSIS Nullsoft Install system and making your own Swish-eX installation (the source code is included).

        If you have any question, comments, I will be happy to read them. Mail me at r.ca(at)post.cz or use one of the forms on the left.
        rca, 2004-04-11

AttachmentSize
swish-ex.zip2.44 MB
sxfilters.zip1.15 MB
autoswish.zip62.3 KB