============== Sysadmin Guide ============== *“Everyone has a test environment... not everyone is lucky enough to have a separate production environment.”* Application Cluster =================== The main application cluster is a cluster of 3 physical machines managed by `proxmox`_ . With proxmox, it's easy to spin up new VPS from a template, move a VPS from a physical machine in the cluster to another one, powerdown / reboot machines, and other common sysadmin operations. .. _proxmox: https://www.proxmox.com/en/ Proxmox' web admin interface ---------------------------- You can access the Proxmox admin interface using one of the following urls. * https://appcl01:8006 * https://appcl02:8006 * https://appcl03:8006 * https://appcl04:8006 You have to be in the VPN in order to access them. Access to the webadmin interface is restricted by the following constraints: * username and password pair * OTP Database Clusters ================= We have highly redundant PostgreSQL, PGPool-II and MariaDB database clusters, which use `PostgreSQL-XL`_, `PGPool-II`_ and `MariaDB Galera`_ respectively. .. _PostgreSQL-XL: https://www.postgres-xl.org/overview/ .. _PGPool-II: https://www.pgpool.net .. _MariaDB Galera: https://mariadb.com/kb/en/library/what-is-mariadb-galera-cluster/ You can read more about the database clusters (and how to fix them in case of problems) in their documentation section: :doc:`../dbcl/index` Resizing Filesystems ==================== From time to time, disk space could not be enough on one server, so you'll have to resize the disk ONLINE (i.e. without shutting down the server). Don't worry, it's easy if you follow the guide below. Please notice that the guide works only if you have created the VM using one of the templates provided (slackware or devuan). You can manage if you manually created one, but the resize section could not work. To start, log in one of the Proxmox administration consoles, then go to the VM which got the disk you want to resize. Click on the *Hardware* link and you'll be presented with the virtual hardware you have on it. One of the lines will read something like that: Hard Disk (scsiX) VM-Pool_1_vm:vm_xxx-disk-x,size=XXXG Click on it, then on the button *Resize Disk* above. Enter the number of GB you want it to GROW. Normally you'll want to resize the root filesystem (because the template by default does not create additional filesystems for /var, /home, etc, so you'll have just /boot, / and the swap) Now login on the server with a root account. At this point, for example, if you resized the scsi0 disk, you can do .. code-block:: sh fdisk /dev/sda /dev/sdb is for scsi1. You will read: .. code-block:: sh Command (m for help): Type *p* at this command prompt and hit return. You will see something like this: Welcome to fdisk (util-linux 2.27.1). Changes will remain in memory only, until you decide to write them. Be careful before using the write command. Command (m for help): p Disk /dev/sda: 32 GiB, 34359738368 bytes, 67108864 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disklabel type: dos Disk identifier: 0x72e3860a Device Boot Start End Sectors Size Id Type /dev/sda1 2048 1640447 1638400 800M 83 Linux /dev/sda2 1640448 10029055 8388608 4G 82 Linux swap /dev/sda3 10029056 67108863 57079808 27.2G 83 Linux If we resized the /dev/sda disk, remember that the ROOT filesystem, if using one of the provided templates, is always on the LAST partition of the disk. Now you can delete the last partition (/dev/sda3). Don't worry, you will not lose data. .. code-block:: sh Command (m for help): d Partition number (1-3, default 3): 3 Partition 3 has been deleted. Then immediately re-create it with default size: .. code-block:: sh Command (m for help): n Partition type p primary (2 primary, 0 extended, 2 free) e extended (container for logical partitions) Select (default p): p Partition number (3,4, default 3): First sector (10029056-67108863, default 10029056): Last sector, +sectors or +size{K,M,G,T,P} (10029056-67108863, default 67108863): Created a new partition 3 of type 'Linux' and of size 27.2 GiB. Finally, write your changes to disk. You will see a warning for sure: .. code-block:: sh Command (m for help): w The partition table has been altered. Calling ioctl() to re-read partition table. Re-reading the partition table failed.: Device or resource busy The kernel still uses the old table. The new table will be used at the next reboot or after you run partprobe(8) or kpartx(8). Now you have to tell the kernel you've changed something. In this case, the disk is /dev/sda, so: .. code-block:: sh partx -u /dev/sda Now the kernel nows the partition table changed. Finally we can resize the filesystem! .. code-block:: sh resize2fs /dev/sda3 Wait some time, and you'll see your partition grew up in size! Backup ====== The servers are backed up daily using `bacula`_. You can access its web interface here: https://backup-admin while in the VPN. * User is bacula * Password - ask the BOFH .. _bacula: https://blog.bacula.org Putting server on backup ------------------------ To put a server on backup, start by cloning a slackware-template or a devuan-template on the AppCluster. After you've done all necessary configuration, you can edit the file in the new server: .. code-block:: sh /etc/bacula/bacula-fd.conf And change the relevant configuration elements: - Change everything that contains `dbcl03` to `hostname-of-the-server` - If you want, change FDAddress to the internal server address (this is firewalled anyway) - Change the Password for the Director (Name = `storage-dir`) A useful command to generate a random password can be: .. code-block:: sh openssl rand -base64 33 Remember that this same password must be configured on the bacula server. Once you have configured and restarted the bacula-fd with the commands: .. code-block:: sh /etc/rc.d/rc.bacula restart service bacula-fd restart Depending from the distribution you cloned in the first place, you can now configure the bacula server. Go to `storage.webmonks.net`, and open the file `/etc/bacula/bacula-dir.conf`. Here is a snippet of what you should configure in the job section (search for the first "Job {" string): .. code-block:: sh Job { Name = "servername-backup" JobDefs = "Backup filesystem full no TS" Client = servername } And, in the clients section (search for the first "Client {" string): .. code-block:: sh Client { Name = servername Address = servername.localmonks.net FDPort = 9102 Catalog = monk-catalog Password = " File Retention = 15 days Job Retention = 1 month AutoPrune = yes } After you've finished, you can launch the command: .. code-block:: sh bconsole And type: .. code-block:: sh reload Be careful, because if there are errors, the bconsole will HANG. If you're cautious, you could first check for errors by typing: .. code-block:: sh bacula-dir -Ttc /etc/bacula/bacula-dir.conf And then, if it's all OK, you can reload. After you reloaded bacula, you can try running your shiny new job in the bconsole by typing: .. code-block:: sh run And follow the menu. Restoring a server from backup ------------------------------ Let's hope you aren't in need of this. But if you do, here's what you have to do. Please notice that if you are FULL-RESTORING a server, you have to reinstall a base system with the same template as the server you're going to reinstall: you can check it on the appcl cluster console, or you can simply recognize it from the backup, by doing a fast lookup on the restore operation below, not running it. I use the /etc/hostname file to recognize them: if you have /etc/HOSTNAME then you have a Slackware 14.2 system. Otherwise, you have a Devuan system. Login in the "storage" server and run the command: .. code-block:: sh bconsole In the bconsole menu, you can type: And type: .. code-block:: sh restore You will be presented with this menu: To select the JobIds, you have the following choices: 1: List last 20 Jobs run 2: List Jobs where a given File is saved 3: Enter list of comma separated JobIds to select 4: Enter SQL list command 5: Select the most recent backup for a client 6: Select backup for a client before a specified time 7: Enter a list of files to restore 8: Enter a list of files to restore before a specified time 9: Find the JobIds of the most recent backup for a client 10: Find the JobIds for a backup for a client before a specified time 11: Enter a list of directories to restore for found JobIds 12: Select full restore to a specified Job date 13: Cancel You have several means to restore data from a backup, let's follow the easiest one for now. Choose the *5* option for now. You will then read the list of available clients. Each one is a server you have put on backup. Choose one of the servers you want to restore. If all went well, you will be presented with a pseudo-shell, where you will be able to navigate using cd and ls, and will be able to type the special command *add* in order to add directories or files to the list of files you want to restore. .. code-block:: sh You are now entering file selection mode where you add (mark) and remove (unmark) files to be restored. No files are initially added, unless you used the "all" keyword on the command line. Enter "done" to leave this mode. cwd is: / $ ls bin/ boot dev etc/ home initrd.img lib/ lib64/ media/ misc net opt/ root/ run sbin/ srv usr/ var/ vmlinuz Choose which files you want to restore (or type *add all* to choose them all), and then type *done*. You will be presented with this menu: .. code-block:: sh 1 file selected to be restored. Using Catalog "monk-catalog" Run Restore job JobName: Restore backup files Bootstrap: /var/lib/bacula/working/storage-dir.restore.1.bsr Where: /tmp/bacula-restores Replace: Always FileSet: Backup Client: Restore Client: Storage: raid5storage When: 2018-09-12 11:26:27 Catalog: monk-catalog Priority: 10 Plugin Options: *None* OK to run? (yes/mod/no): As you can see, the files will be restored in the /tmp/bacula-restores directory of the client. You can of course change it to / on the server, but be careful, because the restore will be done as "bacula" user on the remote server, so you will not be able to write on the / filesystem. I suggest you to leave it as is, then login on the server and do something like: .. code-block:: sh cd /tmp/bacula-restores tar -pcvf - * | tar -C / -pcvf - Then you will be able to destroy the source directory. Reboot your new system and enjoy your restored server. Restoring DB backups -------------------- We also have a daily backup for databases. The restore process is pretty similar to the above, with a slight difference: when you will be choosing files to restore, you'll see database names. When you will choose them, you're actually choosing which database to restore. In order to restore database file, after you have chosen the client to restore the files from, you will be presented with something like this: .. code-block:: sh The defined FileSet resources are: 1: Full FS no tablespaces 2: PostgreSQL Logical Backup 3: PostgreSQL tablespace The "Logical" backup is what you need. Then you will be choosing database file(s) to restore, and when you'll run the job, the database will be cleared and restored with the new copy. In our Great Plans (tm) we are preparing a pg_barman solution to have a PITR backup for Postgres. Stay tuned for instructions. Physical Machines ================= Our physical servers are provided by `OVH`_. .. _OVH: https://www.ovh.com/world/ DNS Management ============== We manage all of our domains with `Route 53`_ by amazon, which has some cool features. .. _Route 53: https://aws.amazon.com/route53/ We also have an internal DNS service running on ns1.localmonks.net (which is an alias for storage.localmonks.net), and on ns2.localmonks.net, which is it's slave. Should you install a new server as a BOFH, you should configure, on ns1.localmonks.net, the `/etc/named/zones/*` files. The most common files to be edited are localmonks.net and 0.18.172.in-addr.arpa: the first is used for direct resolution, the second is used for reverse lookup. Please remember to edit the "serial" number for every edit you make, and then run: .. code-block:: sh named-checkzone If it's all ok, you can do: .. code-block:: sh rndc reload In this way, changes are propagated to slave server. Wordpress/Sites/NodeJS Cluster ============================== A brief note: all the services below run as user "www-data", apart from NodeJS which runs with user "node". This is done this way because of the group permissions. If you are a nodeJS administrator, you can become the user node with the usual command sudo -u node -i, otherwise, if you are a www administrator, you can become it with sudo -u www-data -i. Login with your usual personal account, and then become the application user with sudo. Wordpress --------- On wordpress-be-prod01 and wordpress-be-prod02 we also have a wordpress cluster! You can migrate your site here. Ask @agostikkio (the wordpress admin) to integrate your site! The WP cluster runs on APACHE, and its configuration is in: .. code-block:: sh /etc/apache2/sites-enabled/default-ssl.conf. The wordpress cluster runs in /var/www/wordpress. You can login on the Wordpress cluster console by going into .. code-block:: sh https://wp.monksoftware.it/wp-admin and logging in with administrator credentials. If you do not have them, ask a BOFH or @agostikkio to create one for you. You can restart https with .. code-block:: sh service apache2 restart *Please remember* that the /var/www/wordpress hosts multiple sites! You should not, for any reason, modify files therein, otherwise you could break the wordpress multi-site installation. If you need to import new sites, follow the guide below (by @agostikkio): 1. Export information of current site with native Wordpress importer (Strumenti -> Esporta -> Tutti i contenuti) having a .xml file. 2. Dump current website's .sql data. 3. Create a new site from the backoffice network section (I miei siti -> Gestione network -> Siti) choosing a proper new fourth level domain-name. Please notice that you can also use second-level domains. 4. Check which new tables (prefix -> e.g.: 'wp_8') this new site will create and search/replace this prefix into the previously dumped file (ask BOFH if you don't know how to do it). 5. Give .sql file to the BOFH that will updates the database. 6. After the database update, import previously exported .xml data into the new network website, using the native Wordpress importer (Strumenti -> Importa). 7. Check for differences from current and network site, after installed plugins, imported data and so on. 8. Ask the BOFH to create the virtualhost on the NGINX Frontends, so the traffic will be properly directed on the Wordpress cluster. Use the existing files in /etc/nginx/sites-available as a template WARNINGS: - New plugins have to be installed from plugin's network section (I miei siti -> Gestione network -> Plugin) and then activated into new website dashboard. - Exporter doesn't export all data (e.g.: no menus, some plugins functions). Static/PHP sites ---------------- The wordpress cluster also runs all static/basic sites in /var/www/sites, and those are exposed using NGINX UNIT. The configuration for NGINX UNIT lies in: .. code-block:: sh /etc/unit/unit.config You can configure as many services as you like, UNIT will come up with all of them. Create a configuration like this: .. code-block:: sh { "listeners": { ":": { "application": "" }, }, "applications": { "": { "type": "php", "processes": { "max": 200, "spare": 10 }, "environment": { "UMASK": "0002" }, "user": "www-data", "group": "nogroup", "root": "/var/www/homedir", "script": "index.php" } } And then run .. code-block:: sh /etc/init.d/unit loadconfig Now you'll see that there is a new service running on "". You can proceed by configuring your virtualhost on the NGINX Frontends, nginx-fe01, nginx-fe02, nginx-fe03, which sends traffic on the UNIT cluster. NodeJS ------ We're also running NodeJS servers on this cluster! Become user "node", if you are in the "nodeadmins" group, and you'll be able to check service by typing: .. code-block:: sh pm2 status If you are a nodeJS developer, you'll find it easy to do stuff. Please remember that direct deploy on the server using pm2 is forbidden. Please notice that the wordpress/sites/nodejs cluster is running on a GlusterFS clustered filesystem, so any change you do on one server is instantly reflected on all other servers. There's no NFS'd single point of failure, and the servers can run independently of each other. Owncloud auto-share =================== If a user happens to have a roaming profile (e.g a profile running on our storage server, which lets you have the same home directory on whatever server you ssh into), he can also publish files (and only files) to their owncloud directory. Just tell them to create a ".owncloud_tunnel" directory under their own home dir. Every hour, at minute 16, all files contained therein will be moved to the Owncloud share. You'll know because the files will disappear from their homes. Logging ======= All logging for clustered servers will appear in the storage server, located at storage.localmonks.net and only reachable via VPN. The logs are gathered using syslog from all the concerned servers, where applicable. Currently these are the services logged into storage: +-------------------+--------------------+ | Service | Location | +===================+====================+ | Pushcamp | /var/log/pushcampa | +-------------------+--------------------+ | Odoo | /var/log/odoo | +-------------------+--------------------+ | Wind-proxy | /var/log/windproxy | +-------------------+--------------------+ | XMPP | /var/log/xmpp | +-------------------+--------------------+ | Redis Cluster | /var/log/redis | +-------------------+--------------------+ | Galera Cluster | /var/log//dbcl | +-------------------+--------------------+ | Postgres-XL | /var/log/dbcl | +-------------------+--------------------+ | Cassandra Cluster | /var/log/cassandra | +-------------------+--------------------+ | Riak KV | /var/log/riak | +-------------------+--------------------+ Logging for Docker swarm services (such as the buddies, the router, etc) is done on the Docker Swarm servers and the logs are fetched using the command .. code-block:: sh docker service logs You can search for available services with the command .. code-block:: sh docker service ls