==============
Sysadmin Guide
==============

*“Everyone has a test environment...
not everyone is lucky enough to have a separate production environment.”*

Application Cluster
===================

The main application cluster is a cluster of 3 physical machines managed
by `proxmox`_ . With proxmox, it's easy to spin up new VPS from a template,
move a VPS from a physical machine in the cluster to another one, powerdown
/ reboot machines, and other common sysadmin operations.

.. _proxmox: https://www.proxmox.com/en/

Proxmox' web admin interface
----------------------------

You can access the Proxmox admin interface using one of the following  urls.

* https://appcl01:8006
* https://appcl02:8006
* https://appcl03:8006
* https://appcl04:8006

You have to be in the VPN in order to access them.
Access to the webadmin interface is restricted by the following constraints:

* username and password pair
* OTP


Database Clusters
=================

We have highly redundant PostgreSQL, PGPool-II and MariaDB database clusters, which
use `PostgreSQL-XL`_, `PGPool-II`_ and `MariaDB Galera`_ respectively.

.. _PostgreSQL-XL: https://www.postgres-xl.org/overview/
.. _PGPool-II: https://www.pgpool.net
.. _MariaDB Galera: https://mariadb.com/kb/en/library/what-is-mariadb-galera-cluster/

You can read more about the database clusters (and how to fix them in case of problems) 
in their documentation section:
:doc:`../dbcl/index`


Resizing Filesystems
====================

From time to time, disk space could not be enough on one server, so you'll have to
resize the disk ONLINE (i.e. without shutting down the server). Don't worry,
it's easy if you follow the guide below.
Please notice that the guide works only if you have created the VM using one of the
templates provided (slackware or devuan). You can manage if you manually created one,
but the resize section could not work.

To start, log in one of the Proxmox administration consoles, then go to the VM
which got the disk you want to resize. Click on the *Hardware* link and you'll be presented
with the virtual hardware you have on it. One of the lines will read something like that:

Hard Disk (scsiX)          VM-Pool_1_vm:vm_xxx-disk-x,size=XXXG

Click on it, then on the button *Resize Disk* above. Enter the number of GB you want it to GROW.
Normally you'll want to resize the root filesystem (because the template by default does not
create additional filesystems for /var, /home, etc, so you'll have just /boot, / and the swap)

Now login on the server with a root account.
At this point, for example, if you resized the scsi0 disk, you can do

.. code-block:: sh

   fdisk /dev/sda

/dev/sdb is for scsi1. You will read:

.. code-block:: sh

   Command (m for help): 

Type *p* at this command prompt and hit return.

You will see something like this:

Welcome to fdisk (util-linux 2.27.1).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.


Command (m for help): p
Disk /dev/sda: 32 GiB, 34359738368 bytes, 67108864 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x72e3860a

Device     Boot    Start      End  Sectors  Size Id Type
/dev/sda1           2048  1640447  1638400  800M 83 Linux
/dev/sda2        1640448 10029055  8388608    4G 82 Linux swap
/dev/sda3       10029056 67108863 57079808 27.2G 83 Linux

If we resized the /dev/sda disk, remember that the ROOT filesystem, if using 
one of the provided templates, is always on the LAST partition of the disk.
Now you can delete the last partition (/dev/sda3). Don't worry, you will not
lose data.

.. code-block:: sh

   Command (m for help): d
   Partition number (1-3, default 3): 3

Partition 3 has been deleted.

 Then immediately re-create it with default size:

.. code-block:: sh

   Command (m for help): n
   Partition type
      p   primary (2 primary, 0 extended, 2 free)
      e   extended (container for logical partitions)
   Select (default p): p
   Partition number (3,4, default 3): 
   First sector (10029056-67108863, default 10029056): 
   Last sector, +sectors or +size{K,M,G,T,P} (10029056-67108863, default 67108863): 

   Created a new partition 3 of type 'Linux' and of size 27.2 GiB.


Finally, write your changes to disk. You will see a warning for sure:


.. code-block:: sh

   Command (m for help): w
   The partition table has been altered.
   Calling ioctl() to re-read partition table.
   Re-reading the partition table failed.: Device or resource busy
     
   The kernel still uses the old table. The new table will be used at the next reboot or after you run partprobe(8) or kpartx(8).

Now you have to tell the kernel you've changed something. In this case, the disk is /dev/sda, so:

.. code-block:: sh
   
   partx -u /dev/sda

Now the kernel nows the partition table changed. Finally we can resize the filesystem!

.. code-block:: sh

   resize2fs /dev/sda3

Wait some time, and you'll see your partition grew up in size!


Backup
======

The servers are backed up daily using `bacula`_.

You can access its web interface here: https://backup-admin while in the VPN.

* User is bacula
* Password - ask the BOFH

.. _bacula: https://blog.bacula.org

Putting server on backup
------------------------

To put a server on backup, start by cloning a slackware-template or a 
devuan-template on the AppCluster. After you've done all necessary
configuration, you can edit the file in the new server:

.. code-block:: sh

   /etc/bacula/bacula-fd.conf

And change the relevant configuration elements:
   - Change everything that contains `dbcl03` to `hostname-of-the-server`
   - If you want, change FDAddress to the internal server address (this is firewalled anyway)
   - Change the Password for the Director (Name = `storage-dir`)
     A useful command to generate a random password can be:

.. code-block:: sh

   openssl rand -base64 33

Remember that this same password must be configured on the bacula server.
Once you have configured and restarted the bacula-fd with the commands:

.. code-block:: sh

   /etc/rc.d/rc.bacula restart
   service bacula-fd restart

Depending from the distribution you cloned in the first place, you can
now configure the bacula server. Go to `storage.webmonks.net`, and
open the file `/etc/bacula/bacula-dir.conf`.

Here is a snippet of what you should configure in the job section
(search for the first "Job {" string):

.. code-block:: sh

   Job {
      Name = "servername-backup"
      JobDefs = "Backup filesystem full no TS"
      Client = servername
   }

And, in the clients section (search for the first "Client {" string):

.. code-block:: sh

   Client {
      Name = servername
      Address = servername.localmonks.net
      FDPort = 9102
      Catalog = monk-catalog
      Password = "<the-password-you-configured-on-server>
      File Retention = 15 days
      Job Retention = 1 month
      AutoPrune = yes
   }

After you've finished, you can launch the command:

.. code-block:: sh

   bconsole

And type:

.. code-block:: sh

   reload

Be careful, because if there are errors, the bconsole will HANG.
If you're cautious, you could first check for errors by typing:

.. code-block:: sh

   bacula-dir -Ttc /etc/bacula/bacula-dir.conf

And then, if it's all OK, you can reload.
After you reloaded bacula, you can try running your shiny new job in the
bconsole by typing:

.. code-block:: sh

   run

And follow the menu.


Restoring a server from backup
------------------------------

Let's hope you aren't in need of this. But if you do, here's what
you have to do.

Please notice that if you are FULL-RESTORING a server, you have
to reinstall a base system with the same template as the server
you're going to reinstall: you can check it on the appcl cluster
console, or you can simply recognize it from the backup, by doing
a fast lookup on the restore operation below, not running it. 
I use the /etc/hostname file to recognize them: if you have
/etc/HOSTNAME then you have a Slackware 14.2 system. Otherwise,
you have a Devuan system.

Login in the "storage" server and run the command:

.. code-block:: sh

   bconsole

In the bconsole menu, you can type:

And type:

.. code-block:: sh

   restore

You will be presented with this menu:

To select the JobIds, you have the following choices:
     1: List last 20 Jobs run
     2: List Jobs where a given File is saved
     3: Enter list of comma separated JobIds to select
     4: Enter SQL list command
     5: Select the most recent backup for a client
     6: Select backup for a client before a specified time
     7: Enter a list of files to restore
     8: Enter a list of files to restore before a specified time
     9: Find the JobIds of the most recent backup for a client
     10: Find the JobIds for a backup for a client before a specified time
     11: Enter a list of directories to restore for found JobIds
     12: Select full restore to a specified Job date
     13: Cancel

You have several means to restore data from a backup, let's follow the
easiest one for now. Choose the *5* option for now. You will then read
the list of available clients. Each one is a server you have put on backup.
Choose one of the servers you want to restore.
If all went well, you will be presented with a pseudo-shell, where you will
be able to navigate using cd and ls, and will be able to type the special
command *add* in order to add directories or files to the list of files you
want to restore.


.. code-block:: sh

   You are now entering file selection mode where you add (mark) and
   remove (unmark) files to be restored. No files are initially added, unless
   you used the "all" keyword on the command line.
   Enter "done" to leave this mode.

   cwd is: /
   $ ls
   bin/
   boot
   dev
   etc/
   home
   initrd.img
   lib/
   lib64/
   media/
   misc
   net
   opt/
   root/
   run
   sbin/
   srv
   usr/
   var/
   vmlinuz

Choose which files you want to restore (or type *add all* to choose them all), 
and then type *done*.

You will be presented with this menu:

.. code-block:: sh

   1 file selected to be restored.

   Using Catalog "monk-catalog"
   Run Restore job
   JobName:         Restore backup files
   Bootstrap:       /var/lib/bacula/working/storage-dir.restore.1.bsr
   Where:           /tmp/bacula-restores
   Replace:         Always
   FileSet:         <type-of-fileset (see below)>
   Backup Client:   <name-of-the-server>
   Restore Client:  <name-of-the-server>
   Storage:         raid5storage
   When:            2018-09-12 11:26:27
   Catalog:         monk-catalog
   Priority:        10
   Plugin Options:  *None*
   OK to run? (yes/mod/no): 


As you can see, the files will be restored in the /tmp/bacula-restores directory
of the client. You can of course change it to / on the server, but be careful,
because the restore will be done as "bacula" user on the remote server, so you 
will not be able to write on the / filesystem. I suggest you to leave it as is, then
login on the server and do something like:

.. code-block:: sh

   cd /tmp/bacula-restores
   tar -pcvf - * | tar -C / -pcvf -

Then you will be able to destroy the source directory.
Reboot your new system and enjoy your restored server.

Restoring DB backups
--------------------

We also have a daily backup for databases. The restore process is pretty
similar to the above, with a slight difference: when you will be choosing
files to restore, you'll see database names. When you will choose them,
you're actually choosing which database to restore. 

In order to restore database file, after you have chosen the client to
restore the files from, you will be presented with something like this:

.. code-block:: sh

   The defined FileSet resources are:
      1: Full FS no tablespaces
      2: PostgreSQL Logical Backup
      3: PostgreSQL tablespace

The "Logical" backup is what you need. Then you will be choosing 
database file(s) to restore, and when you'll run the job, the database
will be cleared and restored with the new copy.

In our Great Plans (tm) we are preparing a pg_barman solution to have a
PITR backup for Postgres. Stay tuned for instructions.


Physical Machines
=================

Our physical servers are provided by `OVH`_.

.. _OVH: https://www.ovh.com/world/


DNS Management
==============

We manage all of our domains with `Route 53`_ by amazon,
which has some cool features.

.. _Route 53: https://aws.amazon.com/route53/


We also have an internal DNS service running on ns1.localmonks.net
(which is an alias for storage.localmonks.net), and on
ns2.localmonks.net, which is it's slave.
Should you install a new server as a BOFH, you should configure,
on ns1.localmonks.net, the `/etc/named/zones/*` files.
The most common files to be edited are localmonks.net and
0.18.172.in-addr.arpa: the first is used for direct resolution,
the second is used for reverse lookup. Please remember to
edit the "serial" number for every edit you make, and then run:

.. code-block:: sh

   named-checkzone <zone-file-you-changed> <zone-name>

If it's all ok, you can do:

.. code-block:: sh

   rndc reload <zone-name>

In this way, changes are propagated to slave server.

Wordpress/Sites/NodeJS Cluster
==============================

A brief note: all the services below run as user "www-data",
apart from NodeJS which runs with user "node". This is done
this way because of the group permissions.
If you are a nodeJS administrator, you can become the user
node with the usual command sudo -u node -i, otherwise, 
if you are a www administrator, you can become it with 
sudo -u www-data -i. Login with your usual personal account,
and then become the application user with sudo.

Wordpress
---------

On wordpress-be-prod01 and wordpress-be-prod02 we also
have a wordpress cluster! You can migrate your site here.
Ask @agostikkio (the wordpress admin) to integrate your
site!

The WP cluster runs on APACHE, and its configuration is in:

.. code-block:: sh
   
   /etc/apache2/sites-enabled/default-ssl.conf.

The wordpress cluster runs in /var/www/wordpress.
You can login on the Wordpress cluster console by going into

.. code-block:: sh

   https://wp.monksoftware.it/wp-admin 

and logging in with administrator credentials. If you do
not have them, ask a BOFH or @agostikkio to create one for you.
You can restart https with 


.. code-block:: sh

   service apache2 restart

*Please remember* that the /var/www/wordpress hosts multiple sites!
You should not, for any reason, modify files therein, otherwise
you could break the wordpress multi-site installation. If you need
to import new sites, follow the guide below (by @agostikkio):


1. Export information of current site with native Wordpress importer (Strumenti -> Esporta -> Tutti i contenuti) having a .xml file.
2. Dump current website's .sql data.
3. Create a new site from the backoffice network section (I miei siti -> Gestione network -> Siti) choosing a proper new fourth level domain-name. Please notice that you can also use second-level domains.
4. Check which new tables (prefix -> e.g.: 'wp_8') this new site will create and search/replace this prefix into the previously dumped file (ask BOFH if you don't know how to do it).
5. Give .sql file to the BOFH that will updates the database.
6. After the database update, import previously exported .xml data into the new network website, using the native Wordpress importer (Strumenti -> Importa).
7. Check for differences from current and network site, after installed plugins, imported data and so on.
8. Ask the BOFH to create the virtualhost on the NGINX Frontends, so the traffic will be properly directed on the Wordpress cluster. Use the existing files in /etc/nginx/sites-available as a template 

WARNINGS: 
- New plugins have to be installed from plugin's network section (I miei siti -> Gestione network -> Plugin) and then activated into new website dashboard.
- Exporter doesn't export all data (e.g.: no menus, some plugins functions).

Static/PHP sites
----------------

The wordpress cluster also runs all static/basic sites in 
/var/www/sites, and those are exposed using NGINX UNIT.
The configuration for NGINX UNIT lies in:

.. code-block:: sh

   /etc/unit/unit.config

You can configure as many services as you like, UNIT will come up with
all of them.  Create a configuration like this:

.. code-block:: sh

   {
       "listeners": {
           "<server-ip>:<service-port>": {
               "application": "<application-name>"
           },

        },
        "applications": {
            "<application-name>": {
               "type": "php",
               "processes": {
               "max": 200,
               "spare": 10
            },
            "environment": {
             "UMASK": "0002"
            },
         "user": "www-data",
         "group": "nogroup",
         "root": "/var/www/homedir",
         "script": "index.php"
       }
   }

And then run 

.. code-block:: sh

  /etc/init.d/unit loadconfig <path-to-file>

Now you'll see that there is a new service running on "<service-port>".
You can proceed by configuring your virtualhost on the NGINX Frontends,
nginx-fe01, nginx-fe02, nginx-fe03, which sends traffic on the UNIT cluster.


NodeJS
------

We're also running NodeJS servers on this cluster! Become user "node",
if you are in the "nodeadmins" group, and you'll be able to check 
service by typing:

.. code-block:: sh

   pm2 status

If you are a nodeJS developer, you'll find it easy to do stuff.
Please remember that direct deploy on the server using pm2 is forbidden.


Please notice that the wordpress/sites/nodejs cluster is running on a 
GlusterFS clustered filesystem, so any change you do on one server is
instantly reflected on all other servers. There's no NFS'd single point
of failure, and the servers can run independently of each other.


Owncloud auto-share
===================

If a user happens to have a roaming profile (e.g a profile running on our
storage server, which lets you have the same home directory on whatever 
server you ssh into), he can also publish files (and only files) to their
owncloud directory.
Just tell them to create a ".owncloud_tunnel" directory under their
own home dir. Every hour, at minute 16, all files contained therein
will be moved to the Owncloud share. You'll know because the files will
disappear from their homes.


Logging
=======

All logging for clustered servers will appear in the storage server, located at
storage.localmonks.net and only reachable via VPN. The logs are gathered using
syslog from all the concerned servers, where applicable.
Currently these are the services logged into storage:

+-------------------+--------------------+
| Service           | Location           |
+===================+====================+
| Pushcamp          | /var/log/pushcampa |
+-------------------+--------------------+
| Odoo              | /var/log/odoo      |
+-------------------+--------------------+
| Wind-proxy        | /var/log/windproxy |
+-------------------+--------------------+
| XMPP              | /var/log/xmpp      |
+-------------------+--------------------+
| Redis Cluster     | /var/log/redis     |
+-------------------+--------------------+
| Galera Cluster    | /var/log//dbcl     |
+-------------------+--------------------+
| Postgres-XL       | /var/log/dbcl      |
+-------------------+--------------------+
| Cassandra Cluster | /var/log/cassandra |
+-------------------+--------------------+
| Riak KV           | /var/log/riak      |
+-------------------+--------------------+

Logging for Docker swarm services (such as the buddies, the router, etc) is done 
on the Docker Swarm servers and the logs are fetched using the command

.. code-block:: sh

    docker service logs <servicename>

You can search for available services with the command

.. code-block:: sh

    docker service ls