2 posts tagged “s3”
If I loose the machine, I am going to loose all my data. So the only way to protect is by backing up. Backing up to a persistant storage - S3 is my ovious choice as no network cost is involved.
- In each 30 minutes all mysql tables are backed up to S3 incrementally containing all data of that day only.
- In the end of the day, the whole database gets backed up at S3
- All contents get backed up at the end of the day.
So the ovious question - what happens if server dies between the 30mis of incremental backup. Well, the answer to that - Amazon has great power backup, LINUX is a stable OS, My monitoring scripts reports the health in each 5mins. I have all transactions stored for at most 30 Mins.
The sudden death happens bacuase of accident, nature don't kill anybody suddenly except natural disasters. What happens if the database server hard disk fails?? What happens if the macine gets a sudden thunder lightening and fried?? I know you have answers to all these questions. In this post I am not inventing a holy 'stay all' solution. I am inventing a solution which provide me a regular insurance against the server issues.
Initially in our production system once the monitoring and backup scripts were switched off . Suddenly the server was hung and unavilable. We contacted amazon forum. The enginners replied, that could not help us. It was a scary situation; We were going to miss 7 days of production data. I felt helpless. Then I rebooted my machine and I had all my data there. So at least the rebooting of a instance is there.
Let's quickly do some backups..
systemDate=`date +%Y-%m-%d`
export TZ=Asia/Calcutta
backupDate=`date +%Y-%m-%d`
backupTime=`date +%H-%M-%S`
deleteStamp=`date -d "2 day ago" +%Y-%m-%d`
machine=`hostname`
hourminute=`date +%H-%M`
Backup all data at the mid night. I hope there is less traffic that time if you are not selling something .... Banks usually find morning 3:00AM to backup. America East coast, West coarst... fundas.. I have here India time zone and will back up at 1.1AM
if [[ "$hourminute" = "01-01" ]]
then
#Ready for the master back up
masterFile=master.sql.$backupDate.$backupTime.$machine
mysqldump --compact --skip-create-options --skip-add-drop-table --skip-set-charset --skip-disable-keys --skip-add-locks --no-create-db --no-create-info --skip-lock-tables --user DDD --password=XXX XXX > $masterFile
bzip2 -z $masterFile
hadoop fs -copyFromLocal $masterFile.bz2 /data/$masterFile.bz2
rm -rf $masterFile.bz2
hadoop fs -rm /data/master.sql.$deleteStamp*
Now we can backup all other files which are necesary to go to hadoop.
Now let's remove the mails which are older than 7 days
If you need the mail pursing script, please let me know and I will send you them :)
Now the incremental backup. This is important. Initially I was doing incremental backup at each 30mins. However, this strategy was little difficult as recovering required loads of files to run. Now I backup for one day change records at an interval of 30mins. So the recovery steps are - take the master backup and then run the latest incremental backup script.
I have a mobile background too. The data synchronization is a big middleware piece in the application development. Intellisync, IBM aanywhere and many other vendors provide this middleware. You can use many complex technologies for merging, diff finding based on the primary key and things like that. What about deletes. So put triggers to take it to another table and then process from there.
I have made my incremental backup easy. I never delete in the tables. I have a flag of active. I make it false. Pursing is an offline process. If I consider some tables, I need to purge. However, people can never reach to see that information as the top level they can't browse because of the flag. Secondly each table has a auto update time stamp and there is a index on this in each table.
#Ready for the incremental back up
incrementFile=increment.sql.$backupDate.$backupTime.$machine
mysqldump --compact --skip-create-options --skip-add-drop-table --skip-set-charset --skip-disable-keys --skip-add-locks --no-create-db --no-create-info --skip-lock-tables --where="touchtime > '$systemDate'" --user XXX--password=XXX YYY > $incrementFile
bzip2 -z $incrementFile
hadoop fs -copyFromLocal $incrementFile.bz2 /data/$incrementFile.bz2
rm -rf $incrementFile.bz2
hadoop fs -rm /data/increment.sql.$deleteStamp*
fi
Now the backup needs to be in the crontab
crontab -e
1,31 * * * * bin/backup.sh
If anywhere you encounter issues, please let me know.
Self express yourself replies to this post :) - Cheers
See our profile for the motivation behind the blog
How to build a Linux install running on EC2
For information on how to start and logging-in to Ec2 instance from desktop, refer to amazon webservices
In this blog, I will share my experiences on baking a machine image on Ec2 with necessary application servers for a java web stack - Apache, Tomcat, Mysql and pma (php based mysql admin tool)
OK, first what problems I faced;
- When I built my first image, the server quickly shooted up of disk usage to 90%. This is because we put all our deployments in sda1. Amazon only gives around 2GB in sda1 and rest on /mnt 158GB. Anything I put on mnt is going to wash away and can't be in the image.
- Finding right executables to install.
To address this problem the strategy is to install java and hadoop in sda1 and rest under mnt. Hadoop helps on getting the rest of the deployment pack from S3 to local /mnt. I assumed this will get the sda1 disk usage to minimum. However there are more surprises waiting in store. First let me take you though steps of building an image.
-
All available base instances ec2-describe-images.cmd -o amazon
-
I chose ec2-public-images/fedora-8-i386-base-v1.06.manifest.xml amazon
-
Started instance ec2-run-instances fedora-8-i386-base-v1.06.manifest.xml -k gsg-keypair
-
mkdir /mnt/install
cd /mnt/install
yum install perl-DBI -
echo "Get Mysql"
wget http://dev.mysql.com/get/Downloads/MySQL-5.1/MySQL-server-community-5.1.24-0....
wget http://dev.mysql.com/get/Downloads/MySQL-5.1/MySQL-client-community-5.1.24-0....
wget http://dev.mysql.com/get/Downloads/MySQL-5.1/MySQL-shared-community-5.1.24-0....
wget http://dev.mysql.com/get/Downloads/MySQL-5.1/MySQL-devel-community-5.1.24-0.r... -
rpm -i MySQL-server-community-5.1.24-0.rhel3.i386.rpm
rpm -i MySQL-client-community-5.1.24-0.rhel3.i386.rpm
rpm -i MySQL-devel-community-5.1.24-0.rhel3.i386.rpm
rpm -i MySQL-shared-community-5.1.24-0.rhel3.i386.rpm -
echo "Now add it to the path"
PATH=$PATH:/usr/bin -
Hardening mysql - /usr/bin/mysql_secure_installation
-
Mysql by default gets configured to start during startup. This is not optimal. Why? I want to have mysql installation directory to mnt and once EC2 starts, I get rest of image ball from S3 and host in mnt. So I went ahead and deleted mysql from startup
-
cd /etc/init.d; mv mysql ~
-
Time to shutdown mysql and move this to mnt
mysqladmin --user=root --password=jskfk shutdown -
cp -R /var/lib/mysql /mnt/install
rm -rf /var/lib/mysql
ln -s /mnt/install/mysql /var/lib/mysql
mysqld --user=root &
Time to take care of Apache -
yum install openssl-devel
cd /mnt/install/downloads
wget http://mirror.nyi.net/apache/httpd/httpd-2.2.6.tar.gz
tar -zxvf httpd-2.2.6.tar.gz
cd /mnt/install/downloads/httpd-2.2.6 -
./configure --prefix=/mnt/install/apache --with-mpm=worker \
--enable-so \
--disable-actions \
--disable-alias \
--disable-asis \
--disable-auth \
--disable-autoindex\
--disable-cgi \
--disable-env \
--disable-imap \
--disable-include \
--disable-negotiation \
--disable-status \
--disable-userdir \
--enable-cache=shared \
--enable-disk_cache=shared \
--enable-mem_cache=shared \
--enable-deflate=shared \
--enable-expires=shared \
--enable-filter=shared \
--enable-ext_filter=shared \
--enable-headers=shared \
--enable-proxy=shared \
--enable-rewrite=shared \
--enable-ssl=shared \
--enable-usertrack=shared \
--enable-vhost_alias=shared \
--enable-auth_digest=shared \
--enable-authn_core=shared \
--enable-authz_core=shared \
--enable-authn-dbm=shared \
--enable-authn-anon=shared \
--enable-authn-dbd=shared \
--enable-authn-alias=shared \
--enable-authz-host=shared \
--enable-authz-groupfile=shared \
--enable-authz-user=shared \
--enable-authz-dbm=shared \
--enable-authz-owner=shared \
--enable-setenvif=shared \
--enable-dumpio=shared \
--enable-log_forensic=shared \
--enable-mods-shared=all \
--with-z=/usr/lib \
--with-openssl-libs=/usr/lib -
make
make install
PATH=$PATH:/mnt/install/apache/bin -
The JK connector installation from Tomcat to Apache. You may see the links are not working. There could be a potential release of a new version.
-
echo "Install JK"
cd /mnt/install/downloads
wget http://www.apache.org/dist/tomcat/tomcat-connectors/jk/source/tomcat-connecto...
tar -zxvf tomcat-connectors-1.2.26-src.tar.gz
cd /mnt/install/downloads/tomcat-connectors-1.2.26-src/native
./configure --with-apxs=/mnt/install/apache/bin/apxs --enable-EAPI
make
make install -
Install and configure Php stuffs
-
echo "Get Php"
cd /mnt/install/downloads
wget http://in2.php.net/get/php-5.2.4.tar.gz/from/us.php.net/mirror
tar -zxvf php-5.2.4.tar.gz
cd /mnt/install/downloads/php-5.2.4
yum install libxml2-devel
./configure --with-apxs2=/mnt/install/apache/bin/apxs --with-mysql=/usr/bin/mysql --with-config-file-path=/mnt/install/apache/conf --with-libxml-dir=./ext/libxmlln -s /usr/lib/libmysqlclient.so /usr/lib/mysql/libmysqlclient.so
make
make install -
Tomcat runs in Java. Java we are going to install in sda1 as hadoop also relies on Java. Why do we need hadoop and java in the main image?? Without this we can't go to S3 and get the tar ball (mnt folder). Hadoop needs Java internally. So we need both :)
-
echo "Java installation"
Goto http://java.sun.com/javase/downloads/index_jdk5.jsp
Download JDK 5.0 Update 15, Accept the License
Copy the link for jdk-1_5_0_15-linux-i586.bin
cd /mnt/install/downloadscd /usr/local
sh /mnt/install/downloads/java.bin
mv jdk1.5.0_15 jdk1.5
export JAVA_HOME=/usr/local/jdk1.5
export PATH=$PATH:$JAVA_HOME/bin -
Install tomcat now
-
cd /mnt/install/downloads
wget http://apache.mirror.facebook.com/tomcat/tomcat-6/v6.0.14/bin/apache-tomcat-6.0.14.tar.gz
tar -zxvf apache-tomcat-6.0.14.tar.gz
cd /mnt/install
mv /mnt/install/downloads/apache-tomcat-6.0.14 /mnt/install/tomcat
-
Time to set all configuration files
-
Make sure follwing lines are there in httpd.conf
# Restrict Access to sensitive files
LoadModule php5_module modules/libphp5.so
LoadModule jk_module modules/mod_jk.so
Deny from all
JkWorkersFile /mnt/install/apache/conf/workers.properties
JkShmFile /mnt/logs/apache.mod_jk.shm
JkLogFile /mnt/logs/apache.mod_jk.log
JkLogLevel error
JkLogStampFormat "[%a %b %d %H:%M:%S %Y] "
JkMount /*/*.xml ajp13
# mod_deflate (compress output for browsers that support it)
AddOutputFilterByType DEFLATE text/html text/plain text/xml text/css application/x-javascript text/javascript text/x-js BrowserMatch ^Mozilla/4 gzip-only-text/html -
workers.properties settings
worker.loadbalancer.type=lb
worker.list=ajp13
worker.ajp13.port=8009
worker.ajp13.host=localhost
worker.ajp13.type=ajp13
worker.loadbalancer.balance_workers=ajp13 -
I took out all my logs - mysql, tomcat, apache to /mnt/logs. That requires proper settings on the files present in the conf directory.
-
Bouncing once the servers:
/mnt/install/tomcat/bin/shutdown.sh
/mnt/install/apache/bin/apachectl stop
/mnt/install/tomcat/bin/startup.sh
/mnt/install/apache/bin/apachectl start
-
Well phpMyAdmin a must have to aminister the mysql database. here is what I have done
-
cd /mnt/install/downloads
wget http://prdownloads.sourceforge.net/phpmyadmin/phpMyAdmin-2.11.1-all-languages.tar.gz?download
tar -zxvf phpMyAdmin-2.11.1-all-languages.tar.gz
cd /mnt/install/downloads/phpMyAdmin-2.11.1-all-languages
mkdir /mnt/install/apache/htdocs/pma
cd /mnt/install/apache/htdocs/pma
cp -R /mnt/install/downloads/phpMyAdmin-2.11.1-all-languages/* .
cp config.sample.inc.php config.inc.php
vi config.inc.php
change $cfg['blowfish_secret'] =
mkdir config
chmod o+rw config
cp config.inc.php config/
chmod o+w config/config.inc.php
mkdir data
chmod o+rw data
-
Start mysql if not running -
mysql --password=whrshfuurw
-
GRANT USAGE ON mysql.* TO 'support'@'localhost' IDENTIFIED BY 'xxuser'
;
GRANT SELECT (
Host, User, Select_priv, Insert_priv, Update_priv, Delete_priv,
Create_priv, Drop_priv, Reload_priv, Shutdown_priv, Process_priv,
File_priv, Grant_priv, References_priv, Index_priv, Alter_priv,
Show_db_priv, Super_priv, Create_tmp_table_priv, Lock_tables_priv,
Execute_priv, Repl_slave_priv, Repl_client_priv
) ON mysql.user TO 'support'@'localhost';
GRANT SELECT ON mysql.db TO 'support'@'localhost';
GRANT SELECT ON mysql.host TO 'support'@'localhost';
GRANT SELECT (Host, Db, User, Table_name, Table_priv, Column_priv)
ON mysql.tables_priv TO 'support'@'localhost';
http://website/pma/scripts/setup.php
Config server: port 3306, authentication = http controluser = support passwd
Upload/Download data directory
mv config/config.inc.php . # move file to current directory
chmod o-w config.inc.php # remove world write permissions
-
Login to the website and ec2-XXXX-XXX-XXX-XXX/pma
-
Let's do some support accounts created:
groupadd support
useradd -g support supportadmin
passwd supportadmin
supportadminpass
ssh supportadmin@localhost
-
Never forget to set the .bashrc. This is the place, I set the timing to India. Believe me, other wise the date thing just kills due to little slippage in the development. The servers run in USA and users sitting in India.. Wow.. Got to be really careful in each and every aspect of the application.
export JAVA_OPTS="-Duser.timezone=IST"
export JAVA_HOME=/usr/local/jdk1.5
export PATH=/mnt/install/tomcat/bin:/mnt/install/apache/bin/:$JAVA_HOME/bin:/usr/local/hadoop/bin:$PATH
set -o vi
alias servers='netstat -l -n -p -t -u -w'
alias stop="apachectl stop; shutdown.sh; netstat -l -n -p -t -u -w | grep 8080 | grep LISTEN | sed 's/ [ ]*/ /g' | cut -d' ' -f7 | cut -d'/' -f1 | xargs kill"
alias start="startup.sh;apachectl start"
echo "Welcome abord.."
-
Little bit hardening of the servers
-
netstat -l -n -p -t -u -w
iptables --list
iptables -A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
iptables -A OUTPUT -m state --state NEW,RELATED,ESTABLISHED -j ACCEPT
iptables -A INPUT -p tcp -m tcp --dport 22 -j ACCEPT
iptables -A INPUT -p tcp -m tcp --dport 80 -j ACCEPT
iptables -A INPUT -p tcp -m tcp --dport 443 -j ACCEPT
iptables -A INPUT -s 127.0.0.1 -j ACCEPT
iptables -A OUTPUT -p tcp -m tcp --dport 22 -j DROP
iptables -A INPUT -j DROP
iptables -A FORWARD -j DROP
-
vi /mnt/install/apache/htdocs/php.ini
expose_php = Off
display_erros = Off
log_errors = On
error_log = /mnt/logs
-
Put the Hadoop in sda1.
cd /mnt/install/downloads
wget http://apache.tradebit.com/pub/hadoop/core/stable/hadoop-0.15.3.tar.gz
gunzip hadoop-0.15.3.tar.gz
cd /usr/local
tar -xvf /mnt/install/downloads/hadoop-0.15.3.tar
rm docs and src
BrowserMatch ^Mozilla/4\.[0678] no-gzip
BrowserMatch \bMSIE\s7 !no-gzip !gzip-only-text/html
I deleted the download directories and log files. Tarred up the whole /mnt/install and sent this to s3 using hadoop. After this I stopped servers, cleaned up mnt and cut an image.
Next time I built as java and hadoop were in /usr/local, I was able to retrieve my deploy ball from s3 and unpackaging it at /mnt. Well I was ready to go with sucessful start of all servers. However another surprise was just waitting for me. I suddenly saw, the disk usage is shotting up. I couldn't figure out why.. It was from /usr/bin
Now It is time to take that out. Softlink was my solution
echo "Moving out the /usr directory from sda1 to mnt"
cp -R /usr /mnt
rm -rf /usr
ln -s /mnt/usr /usr
cd /usr/bin
This created another set of issues though reduced my disk usage to 25%. Inside /usr, there were many relative soft links like ../../. So finding wach one which is missing and fixing it was my job :(
rm -f env; ln -s /bin/env env
rm -f Mail; ln -s /bin/mail Mail
rm -f awk; ln -s /bin/gawk awk
rm -f gettext; ln -s /bin/gettext gettext
rm -f gunzip; ln -s /bin/gunzip gunzip
rm -f gzip; ln -s /bin/gzip gzip
rm -f cut; ln -s /bin/cut cut
rm -f kill; ln -s /bin/kill kill
rm -f gawk; ln -s /bin/gawk gawk
rm -f libdb-4.6.so; ln -s /lib/libdb-4.6.so libdb-4.6.so
rm -f libselinux.so; ln -s /lib/libselinux.so.1 libselinux.so
rm -f libsepol.so; ln -s /lib/libsepol.so.1 libsepol.so
rm -f libdl.so; ln -s /lib/libdl.so.2 libdl.so
rm -f libm.so; ln -s /lib/libm.so.6 libm.so
rm -f libssl.so; ln -s /lib/libssl.so.0.9.8b libssl.so
rm -f libBrokenLocale.so; ln -s /lib/libBrokenLocale.so.1 libBrokenLocale.so
rm -f libthread_db.so; ln -s /lib/libthread_db.so.1 libthread_db.so
rm -f libutil.so; ln -s /lib/libutil.so.1 libutil.so
rm -f libnsl.so; ln -s /lib/libnsl.so.1 libnsl.so
rm -f libnss_compat.so; ln -s /lib/libnss_compat.so.2 libnss_compat.so
rm -f libnss_db.so; ln -s /lib/libnss_db.so.2 libnss_db.so
rm -f libnss_dns.so; ln -s /lib/libnss_dns.so.2 libnss_dns.so
rm -f libnss_files.so; ln -s /lib/libnss_files.so.2 libnss_files.so
rm -f libnss_hesiod.so; ln -s /lib/libnss_hesiod.so.2 libnss_hesiod.so
rm -f libnss_nis.so; ln -s /lib/libnss_nis.so.2 libnss_nis.so
rm -f libnss_nisplus.so; ln -s /lib/libnss_nisplus.so.2 libnss_nisplus.so
rm -f libanl.so; ln -s /lib/libanl.so.1 libanl.so
rm -f libz.so; ln -s /lib/libz.so.1.2.3 libz.so
rm -f libcidn.so; ln -s /lib/libcidn.so.1 libcidn.so
rm -f libresolv.so; ln -s /lib/libresolv.so.2 libresolv.so
rm -f libcrypt.so; ln -s /lib/libcrypt.so.1 libcrypt.so
rm -f libcrypto.so; ln -s /lib/libcrypto.so.0.9.8b libcrypto.so
rm -f librt.so; ln -s /lib/librt.so.1 librt.so
cd /usr/lib/lsb/
rm -f install_initd; ln -s /sbin/chkconfig install_initd
rm -f remove_initd; ln -s /sbin/chkconfig remove_initd
cd /usr/sbin
rm -f accton; ln -s /sbin/accton accton
rm -f kudzu; ln -s /sbin/kudzu kudzu
rm -f rcmysql; ln -s /etc/init.d/mysql rcmysql
rm -f hwclock; ln -s /sbin/hwclock hwclock
cd /usr/share/terminfo/a
rm -f ansi; ln -s /lib/terminfo/a/ansi ansi
cd /usr/share/terminfo/d
rm -f dumb; ln -s /lib/terminfo/d/dumb dumb
cd /usr/share/terminfo/l
rm -f linux; ln -s /lib/terminfo/l/linux linux
cd /usr/share/terminfo/v
rm -f vt100-am; ln -s /lib/terminfo/v/vt100-am vt100-am
rm -f vt100; ln -s /lib/terminfo/v/vt100 vt100
rm -f vt100-nav; ln -s /lib/terminfo/v/vt100-nav vt100-nav
rm -f vt200; ln -s /lib/terminfo/v/vt200 vt200
rm -f vt220; ln -s /lib/terminfo/v/vt220 vt220
cd /usr/share/terminfo/x
rm -f xterm; ln -s /lib/terminfo/x/xterm xterm
rm -f xxterm; ln -s /lib/terminfo/x/xxterm xxterm
---------------------------------------
Now things are good. Let's grab the ball from s3
/usr/local/hadoop/bin/hadoop fs -ls /image
/usr/local/hadoop/bin/hadoop fs -copyToLocal /image/nn.tar .
tar xvf nn.tar
Finally, if you need the mail server and not running, start it,
echo "start the mail server if not running"
listeningports=`netstat -l -n -p -t -u -w | sed 's/ [ ]*/ / g' | cut -d' ' -f4`
isListening=`echo $listeningports | grep 25 | wc -l`
if [ $isListening -ne 1 ]; then
sendmail -bd &
fi
---------------------------------------
Was this blog little too detailed? Well, in this I wanted to give enough details for those wanting to start an EC2 instance can do cut n paste.
Further on I will blog on what to do when an instance is lost, hung up and other failures like that. Interestingly we were able to recover an instance after Amazon support team gave up.
If you are trying out these commands, cool. Let us know how you got on. If you are reading so far I acknowledge your patience. Let me know your view points.
To avoid spam, I am giving our mail address on a image. See below or in the top. Write a comment here or send a mail. Tomorrow I will share how we use EC2 for production hosting rather than for non critical activities such as testing, backup etc.