This file describes how to install the EIAO software on Fedora Core 7. Download Java from Sun: http://java.sun.com/javase/downloads/index.jsp (Read the license agreement and accept it...) (1) Prerequisite: Database setup Make sure that mysqld and postgresqld are running. (On the standard FC7 installation the daemons are maybe NOT started at boot time.) Update pg_hba.conf to trust connections from localhost (and any other computer that should be able to access your databases). E.g. Typically pg_hba.conf should include: # TYPE DATABASE USER CIDR-ADDRESS METHOD # "local" is for Unix domain socket connections only local all all trust # IPv4 local connections: host all all 127.0.0.1/32 trust Note that allows any user on your system to connect to all your databases! If this is not an appropriate security policy, refer to the PostgreSQL manual for other, more restrictive and secure, possibilities. pg_hba.conf is typically located at /var/lib/pgsql/data/pg_hba.conf Update postgresql.conf to support 5000 connections. postgresql.conf should typically include: max_connections = 5000 shared_buffers = 10000 Update also the number of resources PostgreSQL can allocate. The exact numbers depend on your system, but the default configuration is way too conservative. Try, for example, values like shared_buffers = 512MB work_mem = 64MB maintenance_work_mem = 512MB checkpoint_segments = 32 postgresql.conf is typically located at /var/log/pgsql/data/postgresql.conf For postgres to work with that number of resources, the shmmax and sem parameters for the kernel needs to be updated. Typically, sysctl.conf should include: kernel.shmmax=549199872 kernel.sem=250 32000 32 512 sysctl is typically located at /etc/sysctl.conf For the new parameters to take effect run: /sbin/sysctl -p restart postgres /etc/init.d/postresql restart Check that postgresql is running OK. This can be done by the following: tail -50 /var/lib/pgsql/pgstartup.log No error messages should be present in this log. Update my.conf to allow larger packet sizes. Add the following under [mysqld] ... set-variable=record_buffer=64M set-variable=max_allowed_packet=64M my.conf is typically located at /etc/my.cnf restart mysql /etc/init.d/mysqld restart (2) Prerequisite: jdk from Sun Download jdk-1_5_0_09-linux-i586-rpm.bin from http://java.sun.com/products/archive/j2se/5.0_09/index.html (save in /data/src/) If you want to install from yum yum install --enablerepo=jpackage-generic-nonfree jdk1.6.0 with Java1.6 (3) Installation of additional software packages Get the installation script from: http://svn.eiao.net/robacc/SystemStart/FC7Install.sh This script will install additional packages, setup the SQL-databases, and download the eiao sources. sh FC7Install (4) Install the Data Warehouse # Create DW. The python script must be run by the 'postgres' user. # 20071102 sigurdkb@uia.no #cd /data/svn/robacc/Datawarehouse/ # 20071102 sigurdkb@uia.no #sudo su postgres -c "python createeiaodwr20.py" cd /data/svn/robacc/Datawarehouse/ sudo su postgres -c "python createeiaodwr20.py" (5) Installation and initial setup of EIAO software: Manually change the passwords in robacc/SystemConfiguration/initial.rdf These passwords include urlreppassword and dbpassword. 'removed' is not the intended password. cd /data/svn/robacc/ python setup.py install During the initial installation you will be asked to choose passwords for postgres. These should be the same as in initial.rdf (6) Upgrade of EIAO software: If the EIAO software is already installed and you wish to upgrade it, go to the robacc directory: cd /data/svn/robacc And run SVN update: svn update Then deploy the packages with a clean install, to recompile all python packages: python setup.py clean install (7) Now configure the observatory, by editing /etc/eiao/initial.rdf These services are started automatically on boot, and are automatically restarted if necessary: ETL server: /etc/init.d/etlserver start Relaxed WAM server: /etc/init.d/relaxedwam start [Note that to compile the jars for the first time the wam might have to be started as root: /data/svn/robacc/WAMs/relaxed_wam/relaxedwam .] Site URL server: /etc/init.d/siteurlserver start Crawlers: /etc/init.d/crawlers start Sampler: /etc/init.d/samplingserver start To populate the URL repository eiaoasses -i directory e.g. eiaoasses -i $EIAOHOME/CapGemini/ Note that the directory must contain csv-files with title, proper nuts and nace categories. When all these services are running, you can start a crawl by running: eiaoassess -r testrunid -t site This will start a crawl towards all sites in the URL repository. Alternatively, you can do the following: eiaoassess -r testrunid -f file Where filelist is a file with a list of URLs http://www.eiao.net http://www.norge.no ... Note that the testrun parameter must be a unique integer identifier for the testrun you perform,200709.