Sync’ing Host and Container Users in Docker
I currently have a project where I need to host multiple users on one server using Docker with clear segregation between each users. I am writing this to document some of my experience in case someone else find them useful.
This is my first time using Docker in production (albeit not one with a strict uptime requirement) so if anyone sees any problem with my setup or know a better way to do what I’m doing, please share it in the comment!
Overview
In this setup, I have an nginx instance fronting all traffic as an HTTPS-terminator and forward the traffic to each site’s apache instances. The nginx instance will be a container running as the “frontend” user. Each site’s Apache, PHP and other services will be running as their own Unix user (e.g. siteA, siteB, siteC) serving content from their home folder.
I would like all containers to use the same user database to simplify administration. Users can also SSH in via a separate SSHD container to update their websites, so using the same user ID (uid) and group ID (gid) across all containers will prevent us from having any permission issue or needing to set insecure permission (chmod 777!) to allow PHP to write files.
Users Setup
Looking around the Internet, I have found many solutions to this issue. The more complex ones include setting up an LDAP server. However, I think it is too complex for a single machine setup. I have chosen a simpler method, using file bind mounts to mount the /etc/passwd file into the container using docker-compose.
version: '2'
services:
sshd:
image: my/image
container_name: my_container_name
hostname: access-container
volumes:
# For access by each user
- /home:/home
# Authentication
# passwd/group should be mounted into any container
# needing to share the user/group IDs
- /etc/passwd:/etc/passwd:ro
- /etc/group:/etc/group:ro
# Shadow should only be mounted into containers
# needing to authenticate against PAM
- /etc/shadow:/etc/shadow:ro
restart: always
ports:
- "22:22"
This works reasonably well in the sense that files created by the correct user in the container have the correct uid/gid. However, there are some issues.
Issue: Updated Passwords Don’t Sync into the SSHD container
One issue is that once I’ve used passwd to change a password on the host, it is not reflected in the container until I restart it. For containers that just need the uid/gid of the site’s user, such as Apache or PHP, there is no issue. However, for containers that perform authentication such as SSHD, this is obviously not desirable. I don’t want to restart the container after a password update as it will kill any active sessions!
The issue is that file bind mounts are based on the inode number (a number which internally represents a file). Normally, when you edit a file, the inode number is preserved. If you rename a file, the inode number is also preserved. However, if you delete a file and re-create it in the same location, the inode number may change.
Examples:
A new file is created and its inode number is 1446924
root@e229856e1806:~/test# echo hello > test
root@e229856e1806:~/test# ls -li
total 4
1446924 -rw-r--r-- 1 root root 6 Mar 11 12:27 test
The file is modified and it’s inode number is still 1446924
root@e229856e1806:~/test# echo hello world >> test
root@e229856e1806:~/test# ls -li
total 4
1446924 -rw-r--r-- 1 root root 18 Mar 11 12:27 test
The file is renamed and it’s inode number is still 1446924
root@e229856e1806:~/test# mv test test2
root@e229856e1806:~/test# ls -li
total 4
1446924 -rw-r--r-- 1 root root 18 Mar 11 12:27 test2
The file is deleted and re-created. It has a new inode number of 1446925 (Note: this may be the same number or a new number depending on if any files were created during that time or not. If it is the same number, it is just a coincidence.)
root@e229856e1806:~/test# rm test2
root@e229856e1806:~/test# echo Hello > test2
root@e229856e1806:~/test# ls -li
total 4
1446925 -rw-r--r-- 1 root root 6 Mar 11 12:28 test2
The issue is that the passwd process first renames /etc/shadow to a backup file and create a new file with a new inode number to replace it to ensure that the file never gets corrupted. While good for normal operations, the file bind mount still pointing to the old inode number and thus the containers see the outdated file.
Solution
The solution I’ve used is to copy the passwd, group and shadow file to another location on /opt/passwd/. The /opt/passwd/ files are file mounted in the container and are (currently manually) copied from the host system whenever they are updated.
cp -p /etc/{passwd,group,shadow} /opt/passwd
The “cp” command normally updates the file in-place, preserving the inode number and thus Docker will see the updated file. The updated docker-compose file is as follows.
version: '2'
services:
sshd:
image: my/image
container_name: my_container_name
hostname: access-container
volumes:
# For access by each user
- /home:/home
# Authentication
- /opt/passwd/passwd:/etc/passwd:ro
- /opt/passwd/group:/etc/group:ro
- /opt/passwd/shadow:/etc/shadow:ro
restart: always
ports:
- "22:22"
Running as Another User
Now that the same user database is sync’ed across different containers, I would like to be able to run the container to run as a specific user.
In this case, I want to run all nginx processes as the “frontend” user. You may ask, why not just use nginx’s ability to run its working processes as non-root? I have a cron job (also running as the “frontend” user) that automatically updates Let’s Encrypt certificates. After the update is finished, I want the script to be able to send “HUP” signal to nginx so that it reloads the certificates. If nginx’s master process is running as root, the cron job would not have the permission to send a signal to nginx.
“User” Option
The first thing that came to mind is to use the “user” option in the docker-compose file like this: (The “frontend” user exists on the host system.)
version: '2'
services:
frontend:
image: nginx:stable-alpine
container_name: shared_frontend
user: "frontend"
volumes:
- ./nginx:/etc/nginx:ro
- /etc/passwd:/etc/passwd:ro
- /etc/group:/etc/group:ro
ports:
- "80:8080"
- "443:8443"
However, this is the result…
Recreating shared_frontendERROR: for frontend Cannot start service frontend: linux spec user: unable to find user frontend: no matching entries in passwd file
ERROR: Encountered errors while bringing up the project.
It appears that Docker is reading the user database before the bind mount, so it cannot find the specified. One alternative is to use the ID instead.
version: '2'
services:
frontend:
image: nginx:stable-alpine
container_name: shared_frontend
user: "1001" # 1001 = frontend
volumes:
- ./nginx:/etc/nginx:ro
- /etc/passwd:/etc/passwd:ro
- /etc/group:/etc/group:ro
ports:
- "80:8080"
- "443:8443"
At first, everything looks good, except when you look further…
mymachine:~ pawitp$ docker exec -it shared_frontend sh
/ $ id
uid=1001(frontend) gid=0(root)
Because the entry in the passwd file was missing, Docker didn’t change the group for us, leading to the program still having some super-user privileges that it shouldn’t have.
In addition, hard-coding the uid inside the docker-compose file makes it unportable between systems unless they both have the same user database.
Meet gosu and su-exec
From MariaDB’s official image, I’ve found out about gosu and su-exec. They are a simplified version of the su (switch user) utility. The former is written in golang while the latter is written in C. Normally, su runs setuid and allows a user to switch to root and root to switch to users. Both gosu and su-exec are not setuid’ed and only allow root to drop to unprivileged users — perfect!
By using gosu or su-exec, you can let Docker run the container as root and just drop the privilege the moment you enter it. The docker-compose file will be something like: (Note that I’ve bind mounted the gosu executable because I did not want to build and maintain a custom image just to add an executable.)
version: '2'
services:
frontend:
image: nginx:stable-alpine
container_name: shared_frontend
volumes:
- ./nginx:/etc/nginx:ro
- /etc/passwd:/etc/passwd:ro
- /etc/group:/etc/group:ro
- /opt/su-exec:/usr/sbin/su-exec:ro
ports:
- "80:8080"
- "443:8443"
command: [/usr/sbin/su-exec, frontend, nginx, -g, daemon off;]
That looks pretty good in theory, but when I run it, I am greeted with the following error:
shared_frontend | nginx: [alert] could not open error log file: open() "/var/log/nginx/error.log" failed (13: Permission denied)
shared_frontend | 2017/03/11 14:49:43 [emerg] 1#1: open() "/var/log/nginx/error.log" failed (13: Permission denied)
shared_frontend exited with code 1
Note that in the nginx image, error.log is a symlink to stderr. Long story short, even in standard Linux, once you use su, you loose the ability to write to /dev/stdout and /dev/stderr for some reason. See more at Issue 31243. A workaround that I’ve found is that if you chmod (change permission) those two devices and give write permission to all, it will work fine.
First, I’m root and I can write to /dev/stderr
root@e84410c9a2ee:/# id
uid=0(root) gid=0(root) groups=0(root)
root@e84410c9a2ee:/# echo 1 > /dev/stderr
1
However, once I switch to the “test” user, I can no longer write to /dev/stderr
root@e84410c9a2ee:/# useradd test
root@e84410c9a2ee:/# su - test
No directory, logging in with HOME=/
$ echo 1 > /dev/stderr
-su: 1: cannot create /dev/stderr: Permission denied
If I switch back to root and change the permission to 622, I can write to /dev/stderr as “test” again.
$ exit
root@e84410c9a2ee:/# chmod 622 /dev/stderr
root@e84410c9a2ee:/# su - test
No directory, logging in with HOME=/
$ echo 1 > /dev/stderr
1
Thus, as a temporary fix, I have patched su-exec to chmod those files before running setuid. Link: https://github.com/pawitp/su-exec/commit/52f4163176bc8f5714c2bc4cd5359ea10627e85d.
After running the patched su-exec file, the container runs as expected.
Unsolved Issues
These are issues I have not found a good way to solve yet. Please let me know if you know a good way!
Insecure links to MySQL/MariaDB
I have deployed one central instance of MariaDB for all sites to use. However, for PHP to be able to access it, I have put all PHP instances on MariaDB’s network. However, an unintended side effect of this is that all PHP instances (from different sites) can also communicate with each other!
I can’t find an easy way to setup a firewall to limit this access or configure PHP to only accept connections from its own Apache without resorting to configuring static IPs for all containers.
Allowing users to change their password
Currently, users cannot change their passwords by themselves. Running passwd from the SSHD container will not work due to the bind mount. The passwd command will need to be ran on the host system. However, I am reluctant to give users access to the host system because they’ll be able to gain access to ports opened by all Docker containers, including containers which may not have any access protection.