Sysadmin Guide¶
“Everyone has a test environment… not everyone is lucky enough to have a separate production environment.”
Application Cluster¶
The main application cluster is a cluster of 3 physical machines managed by proxmox . With proxmox, it’s easy to spin up new VPS from a template, move a VPS from a physical machine in the cluster to another one, powerdown / reboot machines, and other common sysadmin operations.
Proxmox’ web admin interface¶
You can access the Proxmox admin interface using one of the following urls.
You have to be in the VPN in order to access them. Access to the webadmin interface is restricted by the following constraints:
username and password pair
OTP
Database Clusters¶
We have highly redundant PostgreSQL, PGPool-II and MariaDB database clusters, which use PostgreSQL-XL, PGPool-II and MariaDB Galera respectively.
You can read more about the database clusters (and how to fix them in case of problems) in their documentation section: DBCL Cluster
Resizing Filesystems¶
From time to time, disk space could not be enough on one server, so you’ll have to resize the disk ONLINE (i.e. without shutting down the server). Don’t worry, it’s easy if you follow the guide below. Please notice that the guide works only if you have created the VM using one of the templates provided (slackware or devuan). You can manage if you manually created one, but the resize section could not work.
To start, log in one of the Proxmox administration consoles, then go to the VM which got the disk you want to resize. Click on the Hardware link and you’ll be presented with the virtual hardware you have on it. One of the lines will read something like that:
Hard Disk (scsiX) VM-Pool_1_vm:vm_xxx-disk-x,size=XXXG
Click on it, then on the button Resize Disk above. Enter the number of GB you want it to GROW. Normally you’ll want to resize the root filesystem (because the template by default does not create additional filesystems for /var, /home, etc, so you’ll have just /boot, / and the swap)
Now login on the server with a root account. At this point, for example, if you resized the scsi0 disk, you can do
fdisk /dev/sda
/dev/sdb is for scsi1. You will read:
Command (m for help):
Type p at this command prompt and hit return.
You will see something like this:
Welcome to fdisk (util-linux 2.27.1). Changes will remain in memory only, until you decide to write them. Be careful before using the write command.
Command (m for help): p Disk /dev/sda: 32 GiB, 34359738368 bytes, 67108864 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disklabel type: dos Disk identifier: 0x72e3860a
Device Boot Start End Sectors Size Id Type /dev/sda1 2048 1640447 1638400 800M 83 Linux /dev/sda2 1640448 10029055 8388608 4G 82 Linux swap /dev/sda3 10029056 67108863 57079808 27.2G 83 Linux
If we resized the /dev/sda disk, remember that the ROOT filesystem, if using one of the provided templates, is always on the LAST partition of the disk. Now you can delete the last partition (/dev/sda3). Don’t worry, you will not lose data.
Command (m for help): d
Partition number (1-3, default 3): 3
Partition 3 has been deleted.
Then immediately re-create it with default size:
Command (m for help): n
Partition type
p primary (2 primary, 0 extended, 2 free)
e extended (container for logical partitions)
Select (default p): p
Partition number (3,4, default 3):
First sector (10029056-67108863, default 10029056):
Last sector, +sectors or +size{K,M,G,T,P} (10029056-67108863, default 67108863):
Created a new partition 3 of type 'Linux' and of size 27.2 GiB.
Finally, write your changes to disk. You will see a warning for sure:
Command (m for help): w
The partition table has been altered.
Calling ioctl() to re-read partition table.
Re-reading the partition table failed.: Device or resource busy
The kernel still uses the old table. The new table will be used at the next reboot or after you run partprobe(8) or kpartx(8).
Now you have to tell the kernel you’ve changed something. In this case, the disk is /dev/sda, so:
partx -u /dev/sda
Now the kernel nows the partition table changed. Finally we can resize the filesystem!
resize2fs /dev/sda3
Wait some time, and you’ll see your partition grew up in size!
Backup¶
The servers are backed up daily using bacula.
You can access its web interface here: https://backup-admin while in the VPN.
User is bacula
Password - ask the BOFH
Putting server on backup¶
To put a server on backup, start by cloning a slackware-template or a devuan-template on the AppCluster. After you’ve done all necessary configuration, you can edit the file in the new server:
/etc/bacula/bacula-fd.conf
- And change the relevant configuration elements:
Change everything that contains dbcl03 to hostname-of-the-server
If you want, change FDAddress to the internal server address (this is firewalled anyway)
Change the Password for the Director (Name = storage-dir) A useful command to generate a random password can be:
openssl rand -base64 33
Remember that this same password must be configured on the bacula server. Once you have configured and restarted the bacula-fd with the commands:
/etc/rc.d/rc.bacula restart
service bacula-fd restart
Depending from the distribution you cloned in the first place, you can now configure the bacula server. Go to storage.webmonks.net, and open the file /etc/bacula/bacula-dir.conf.
Here is a snippet of what you should configure in the job section (search for the first “Job {” string):
Job {
Name = "servername-backup"
JobDefs = "Backup filesystem full no TS"
Client = servername
}
And, in the clients section (search for the first “Client {” string):
Client {
Name = servername
Address = servername.localmonks.net
FDPort = 9102
Catalog = monk-catalog
Password = "<the-password-you-configured-on-server>
File Retention = 15 days
Job Retention = 1 month
AutoPrune = yes
}
After you’ve finished, you can launch the command:
bconsole
And type:
reload
Be careful, because if there are errors, the bconsole will HANG. If you’re cautious, you could first check for errors by typing:
bacula-dir -Ttc /etc/bacula/bacula-dir.conf
And then, if it’s all OK, you can reload. After you reloaded bacula, you can try running your shiny new job in the bconsole by typing:
run
And follow the menu.
Restoring a server from backup¶
Let’s hope you aren’t in need of this. But if you do, here’s what you have to do.
Please notice that if you are FULL-RESTORING a server, you have to reinstall a base system with the same template as the server you’re going to reinstall: you can check it on the appcl cluster console, or you can simply recognize it from the backup, by doing a fast lookup on the restore operation below, not running it. I use the /etc/hostname file to recognize them: if you have /etc/HOSTNAME then you have a Slackware 14.2 system. Otherwise, you have a Devuan system.
Login in the “storage” server and run the command:
bconsole
In the bconsole menu, you can type:
And type:
restore
You will be presented with this menu:
- To select the JobIds, you have the following choices:
1: List last 20 Jobs run 2: List Jobs where a given File is saved 3: Enter list of comma separated JobIds to select 4: Enter SQL list command 5: Select the most recent backup for a client 6: Select backup for a client before a specified time 7: Enter a list of files to restore 8: Enter a list of files to restore before a specified time 9: Find the JobIds of the most recent backup for a client 10: Find the JobIds for a backup for a client before a specified time 11: Enter a list of directories to restore for found JobIds 12: Select full restore to a specified Job date 13: Cancel
You have several means to restore data from a backup, let’s follow the easiest one for now. Choose the 5 option for now. You will then read the list of available clients. Each one is a server you have put on backup. Choose one of the servers you want to restore. If all went well, you will be presented with a pseudo-shell, where you will be able to navigate using cd and ls, and will be able to type the special command add in order to add directories or files to the list of files you want to restore.
You are now entering file selection mode where you add (mark) and
remove (unmark) files to be restored. No files are initially added, unless
you used the "all" keyword on the command line.
Enter "done" to leave this mode.
cwd is: /
$ ls
bin/
boot
dev
etc/
home
initrd.img
lib/
lib64/
media/
misc
net
opt/
root/
run
sbin/
srv
usr/
var/
vmlinuz
Choose which files you want to restore (or type add all to choose them all), and then type done.
You will be presented with this menu:
1 file selected to be restored.
Using Catalog "monk-catalog"
Run Restore job
JobName: Restore backup files
Bootstrap: /var/lib/bacula/working/storage-dir.restore.1.bsr
Where: /tmp/bacula-restores
Replace: Always
FileSet: <type-of-fileset (see below)>
Backup Client: <name-of-the-server>
Restore Client: <name-of-the-server>
Storage: raid5storage
When: 2018-09-12 11:26:27
Catalog: monk-catalog
Priority: 10
Plugin Options: *None*
OK to run? (yes/mod/no):
As you can see, the files will be restored in the /tmp/bacula-restores directory of the client. You can of course change it to / on the server, but be careful, because the restore will be done as “bacula” user on the remote server, so you will not be able to write on the / filesystem. I suggest you to leave it as is, then login on the server and do something like:
cd /tmp/bacula-restores
tar -pcvf - * | tar -C / -pcvf -
Then you will be able to destroy the source directory. Reboot your new system and enjoy your restored server.
Restoring DB backups¶
We also have a daily backup for databases. The restore process is pretty similar to the above, with a slight difference: when you will be choosing files to restore, you’ll see database names. When you will choose them, you’re actually choosing which database to restore.
In order to restore database file, after you have chosen the client to restore the files from, you will be presented with something like this:
The defined FileSet resources are:
1: Full FS no tablespaces
2: PostgreSQL Logical Backup
3: PostgreSQL tablespace
The “Logical” backup is what you need. Then you will be choosing database file(s) to restore, and when you’ll run the job, the database will be cleared and restored with the new copy.
In our Great Plans (tm) we are preparing a pg_barman solution to have a PITR backup for Postgres. Stay tuned for instructions.
Physical Machines¶
Our physical servers are provided by OVH.
DNS Management¶
We manage all of our domains with Route 53 by amazon, which has some cool features.
We also have an internal DNS service running on ns1.localmonks.net (which is an alias for storage.localmonks.net), and on ns2.localmonks.net, which is it’s slave. Should you install a new server as a BOFH, you should configure, on ns1.localmonks.net, the /etc/named/zones/* files. The most common files to be edited are localmonks.net and 0.18.172.in-addr.arpa: the first is used for direct resolution, the second is used for reverse lookup. Please remember to edit the “serial” number for every edit you make, and then run:
named-checkzone <zone-file-you-changed> <zone-name>
If it’s all ok, you can do:
rndc reload <zone-name>
In this way, changes are propagated to slave server.
Wordpress/Sites/NodeJS Cluster¶
A brief note: all the services below run as user “www-data”, apart from NodeJS which runs with user “node”. This is done this way because of the group permissions. If you are a nodeJS administrator, you can become the user node with the usual command sudo -u node -i, otherwise, if you are a www administrator, you can become it with sudo -u www-data -i. Login with your usual personal account, and then become the application user with sudo.
Wordpress¶
On wordpress-be-prod01 and wordpress-be-prod02 we also have a wordpress cluster! You can migrate your site here. Ask @agostikkio (the wordpress admin) to integrate your site!
The WP cluster runs on APACHE, and its configuration is in:
/etc/apache2/sites-enabled/default-ssl.conf.
The wordpress cluster runs in /var/www/wordpress. You can login on the Wordpress cluster console by going into
https://wp.monksoftware.it/wp-admin
and logging in with administrator credentials. If you do not have them, ask a BOFH or @agostikkio to create one for you. You can restart https with
service apache2 restart
Please remember that the /var/www/wordpress hosts multiple sites! You should not, for any reason, modify files therein, otherwise you could break the wordpress multi-site installation. If you need to import new sites, follow the guide below (by @agostikkio):
Export information of current site with native Wordpress importer (Strumenti -> Esporta -> Tutti i contenuti) having a .xml file.
Dump current website’s .sql data.
Create a new site from the backoffice network section (I miei siti -> Gestione network -> Siti) choosing a proper new fourth level domain-name. Please notice that you can also use second-level domains.
Check which new tables (prefix -> e.g.: ‘wp_8’) this new site will create and search/replace this prefix into the previously dumped file (ask BOFH if you don’t know how to do it).
Give .sql file to the BOFH that will updates the database.
After the database update, import previously exported .xml data into the new network website, using the native Wordpress importer (Strumenti -> Importa).
Check for differences from current and network site, after installed plugins, imported data and so on.
Ask the BOFH to create the virtualhost on the NGINX Frontends, so the traffic will be properly directed on the Wordpress cluster. Use the existing files in /etc/nginx/sites-available as a template
WARNINGS: - New plugins have to be installed from plugin’s network section (I miei siti -> Gestione network -> Plugin) and then activated into new website dashboard. - Exporter doesn’t export all data (e.g.: no menus, some plugins functions).
Static/PHP sites¶
The wordpress cluster also runs all static/basic sites in /var/www/sites, and those are exposed using NGINX UNIT. The configuration for NGINX UNIT lies in:
/etc/unit/unit.config
You can configure as many services as you like, UNIT will come up with all of them. Create a configuration like this:
{
"listeners": {
"<server-ip>:<service-port>": {
"application": "<application-name>"
},
},
"applications": {
"<application-name>": {
"type": "php",
"processes": {
"max": 200,
"spare": 10
},
"environment": {
"UMASK": "0002"
},
"user": "www-data",
"group": "nogroup",
"root": "/var/www/homedir",
"script": "index.php"
}
}
And then run
/etc/init.d/unit loadconfig <path-to-file>
Now you’ll see that there is a new service running on “<service-port>”. You can proceed by configuring your virtualhost on the NGINX Frontends, nginx-fe01, nginx-fe02, nginx-fe03, which sends traffic on the UNIT cluster.
NodeJS¶
We’re also running NodeJS servers on this cluster! Become user “node”, if you are in the “nodeadmins” group, and you’ll be able to check service by typing:
pm2 status
If you are a nodeJS developer, you’ll find it easy to do stuff. Please remember that direct deploy on the server using pm2 is forbidden.
Please notice that the wordpress/sites/nodejs cluster is running on a GlusterFS clustered filesystem, so any change you do on one server is instantly reflected on all other servers. There’s no NFS’d single point of failure, and the servers can run independently of each other.
Logging¶
All logging for clustered servers will appear in the storage server, located at storage.localmonks.net and only reachable via VPN. The logs are gathered using syslog from all the concerned servers, where applicable. Currently these are the services logged into storage:
Service |
Location |
|---|---|
Pushcamp |
/var/log/pushcampa |
Odoo |
/var/log/odoo |
Wind-proxy |
/var/log/windproxy |
XMPP |
/var/log/xmpp |
Redis Cluster |
/var/log/redis |
Galera Cluster |
/var/log//dbcl |
Postgres-XL |
/var/log/dbcl |
Cassandra Cluster |
/var/log/cassandra |
Riak KV |
/var/log/riak |
Logging for Docker swarm services (such as the buddies, the router, etc) is done on the Docker Swarm servers and the logs are fetched using the command
docker service logs <servicename>
You can search for available services with the command
docker service ls