Kubernetes Migration Case Study

Migrating Netdelta From Docker to Kubernetes

In latish 2020, I moved Netdelta from a Docker deployment to Kubernetes, partly to see what all this Kubernetes jazz is about, and partly to investigate whether it would help me with the management of Netdelta containers for different punters, each of whom has their own docker container and Apache listening service.

I studiously went through the Kubernetes quick tutorial and found i had to investigate the documentation some more. Even then some aspects weren’t covered so well. This post explains what i did to deploy an app into Kubernetes, and some of the gotchas i encountered along the way, that were not covered so well in the Kubernetes documentation, and I summarise with a view of Kubernetes and give my view on: is the hype justified? Will I continue to host Netdelta in Kubernetes?

This is not a Kubernetes tutorial – it does assume some prior exposure on behalf of the reader, but nonethless links to the relevant documentation when some Kubernetes concepts are covered.

Contents

Netdelta in Docker

This post isn’t about Netdelta, but for illustrative purposes: Netdelta aids with the detection of unauthorised changes, and hacker shells, by running one-off port scans, or scheduled jobs, comparing the results with the previous scan, and alerting on changes. This is more chunky than it sounds, mostly because of the analytics that goes into false positives detection. In the Kubernetes implementation, scan results are held in a stateful persistent volume with MySQL.

Netdelta’s docker config can be dug into here, but to summarise the docker setup:

  • Database container – MySQL 5.7
  • Application container – Apache, Django 3.1.4, Celery 5.0.5, Netdelta
  • Fileserver (logs, virtualenv, code deployment)
  • Docker volumes and networking are utilised

Data Flows / Networking

The data flows aspect reflects what is not exactly a bare metal deployment. A Linode-hosted VM running Ubuntu 20 is the host, then the Kubernetes node is minikube, with another node running on a Raspberry pi 3 – the latter aspect not being a production facility. The pi 3 was only to test how well the config would work with load balancing, and Kubernetes Replicasets across nodes.

Reverse Proxy

Ingress connections from the internet are handled first by nginx acting as a reverse proxy. Base URLs for Netdelta are of the form https://www.netdelta.io/<site>. The nginx config …

server {
    listen 80;
    location /barbican {
	proxy_set_header Accept-Encoding "";
	sub_filter_types text/html text/css text/xml;
	sub_filter $host $host/barbican;
        proxy_pass http://local.netdelta.io/barbican;
    }
}

K8s Ingress Controller

This is passing a URL with a first level of <site> to be processed at local.netdelta.io, which is locally resolvable, and is localhost. This is where the nginx Kubernetes Ingress Controller comes into play. The pods in kubernetes have NodePorts configured but these aren’t necessary. The nginx ingress controller takes connections on port 80, and routes based on service names and the defined listening port:

┌──(iantibble㉿bionic)-[~]
└─$ kubectl describe ingress
Name:             netdelta-ingress
Namespace:        default
Address:          172.17.0.2
Default backend:  default-http-backend:80 (<error: endpoints "default-http-backend" not found>)
Rules:
  Host               Path  Backends
  ----               ----  --------
  local.netdelta.io
                     /barbican   netdelta-barbican:9004 (<none>)
Annotations:         <none>
Events:              <none>

The YAML looks thusly:

┌──(iantibble㉿bionic)-[~/netdd/k8s]
└─$ cat ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: netdelta-ingress
spec:
  rules:
    - host: local.netdelta.io
      http:
        paths:
          - path: /barbican
            backend:
              service:
                name: netdelta-barbican
                port:
                  number: 9004
            pathType: Prefix

So the nginx ingress controllers sees the connection forwarded from local.netdelta.io with a URL request of local.netdelta.io/<site>. The requests matches a rule, and forwards to the Kubernetes Service of the same name. The entity that actually answers the call is a docker container masquerading as a Kubernetes Pod, which is part of a deployment. The next step in the data flow is to route the connection to the specified Kubernetes Service which is covered briefly here but in more detail later in the coverage of DNS.

The “service” aspect has the effect of exposing the pod according to the service setup:

┌──(iantibble㉿bionic)-[~/netdd/k8s]
└─$ kubectl get services -o wide
NAME                TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)          AGE    SELECTOR
kubernetes          ClusterIP   10.96.0.1                443/TCP          119d   
mysql-netdelta      ClusterIP   10.97.140.111            3306/TCP         39d    app=mysql-netdelta
netdelta-barbican   NodePort    10.103.160.223           9004:30460/TCP   36d    app=netdelta-barbican
netdelta-xynexis    NodePort    10.102.53.156            9005:31259/TCP   36d    app

DNS

There’s an awful lot of waffle out there about DNS and Kubernetes. Basically – and I know the god of devops won’t let me in heaven for saying this, but making a service in Kubernetes leads to DNS being enabled. DNS in a multi-namespace, multi-node scenario becomes more intreresting of course, and there’s plenty you can configure that’s outside the scope of this article.

Netdelta’s Django settings.py defines a host and database name, and has to be able to find the host:

DATABASES = {
'default': {
'ENGINE': 'django.db.backends.mysql', # Add 'postgresql_psycopg2', 'mysql', 'sqlite3' or 'oracle'.
'NAME': 'netdelta-SITENAME', # Not used with sqlite3.
'USER': 'root', # Not used with sqlite3.
'HOST': mysql-netdelta,
'PASSWORD': 'NOYFB',
'OPTIONS': dict(init_command="SET sql_mode='STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER'"),
}
}

This aspect was poorly documented and was far from obvious: the spec.selector field of the service should match the spec.template.metadata.labels of the pod created by the Deployment.

The Application Hosting in Kubernetes

Referring back to the diagram above, there are pods for each Netdelta site. How was the Docker-hosted version of Netdelta represented in Kubernetes?

The Deployment YAML:

apiVersion: apps/v1 kind: Deployment metadata: creationTimestamp: null labels: app: netdelta-barbican name: netdelta-barbican spec: replicas: 1 selector: matchLabels: app: netdelta-barbican strategy: type: Recreate template: metadata: creationTimestamp: null labels: app: netdelta-barbican spec: containers: - image: registry.netdelta.io/netdelta/barbican:1.0 imagePullPolicy: IfNotPresent name: netdelta-barbican ports: - containerPort: 9004 args: - "barbican" - "9004" - "le" - "certs" resources: {} volumeMounts: - mountPath: /srv/staging name: netdelta-app - mountPath: /srv/logs name: netdelta-logs - mountPath: /le name: le - mountPath: /var/lib/mysql name: data - mountPath: /srv/netdelta_venv name: netdelta-venv imagePullSecrets: - name: regcred volumes: - name: netdelta-app persistentVolumeClaim: claimName: netdelta-app - name: netdelta-logs persistentVolumeClaim: claimName: netdelta-logs - name: le persistentVolumeClaim: claimName: le - name: data persistentVolumeClaim: claimName: data - name: netdelta-venv persistentVolumeClaim: claimName: netdelta-venv restartPolicy: Always serviceAccountName: "" status: {}

Running:

kubectl apply -f netdelta-app-<site>.yaml

Has the effect of creating a pod and a container for the Django application, celery and Apache stack:

┌──(iantibble㉿bionic)-[~] └─$ kubectl get deployments NAME READY UP-TO-DATE AVAILABLE AGE fileserver 1/1 1 1 25d mysql-netdelta 1/1 1 1 25d netdelta-barbican 1/1 1 1 25d ┌──(iantibble㉿bionic)-[~] └─$ kubectl get pods NAME READY STATUS RESTARTS AGE fileserver-6d6bc54f6c-hq8lk 1/1 Running 2 25d mysql-netdelta-5fd7757c66-xqp2j 1/1 Running 2 25d netdelta-barbican-68d78c58bd-vnqdn 1/1 Running 2 25d

K8s Equivakent of Docker Entrypoint Script Parameters

Some other points perhaps worthy of mention were around the Docker v Kubernetes aspects. My docker run command for the netdelta application container was like this:

docker run -it -p 9004:9004 --network netdelta_net --name netdelta_barbican -v netdelta_app:/srv/staging -v netdelta_logs:/srv/logs -v data:/data -v le:/etc/letsencrypt netdelta/barbican:core barbican 9004 le certs

So there’s 4 parameters for the entryscript: site, port, le, and cert. The last two are about letsencrypt certs which won’t be covered here. These are represented in the Kubernetes Deployment YAML in spec.template.spec.containers.args.

Private Image Repository

spec.template.spec.containers.image is set to registry.netdelta.io/netdelta/<site>:<version tag>. Yes, that’s right folks, i’m using a private registry, which is a lot of fun until you realise how hard it is to manage the images there. The setup and management of the private registry won’t be covered here but i found this to be useful.

One other point is about security and encryption in transit for the image pushes and pulls. I’ve been in security for 20 years and have lots of unrestricted penetration testing experience. It shouldn’t be necessary or mandatory to use HTTPS over HTTP in most cases. Admittedly i didn’t spend long trying, but i could not find a way to just use good old clear-text port 80 over 443, which in turn meant i had to configure a SSL certifcate with all the management around it, where the risks are far from justifying such a measure.

PV Mounts

In Dockerland I was using Docker Volumes for persistent storage of logs and application data. I was also using it for the application codebase, and any updates would be sync’d with containers by docker exec wrapped in a BASH script.

There was nothing unexpected in the deployment of the PVCs/PVs, but a couple of points are worth mentioning:

  • PV Filesystem mounts: Netdelta container deployment involves a custom image from COPY (Docker command) of files from a local source to the image. Then the container is run and the application can find the required files. The problem i ran into was about having filesystems mounted over the directories where my application container expected to find files. This meant i had to change my container entryscript to sync with the image when the Pod is deployed, whereas previously the directories were built-out from the docker image build.
  • /tmp as default PV files location: if you SSH to the node (minikube container in my case), you will find the mounted filesystems under /tmp. /tmp is a critical directory for the good health of any Linux-based system and it needs to be 777 (i.e. read and writeable by unauthenticated users and processes) with a sticky bit. This is one that for whatever reason doesn’t find its way into security checklists for Kubernetes but it really does warrant some attention. This can be changed by customising Kubernetes Storage Classes. There’s one pointer here.

Database and Fileserver

The MySQL Database service was deployed as a custom built container with my Docker setup. There was no special reason for this other than to change filesystem permissions, and the fact that the listening service needed to be “exposed” and the database config changed to bind to 0.0.0.0 instead of localhost. What i found with the Kubernetes Pod was that I didn’t need to change the Mysql config at all and spec.ports.targetport had the effect of “exposing” the listening service for the database.

The main reason for using a fileserver in the Dcoker deployment of Netdelta was to act as a container buffer between Docker Volumes and application containers. My my Unix hat on, one is left wondering how filesystem persmissions will work (or otherwise) with file read and writes across network mounted disparate unix systems, where even if the same account names exist on each system, perhaps they have different UIDs (BSD-derived systems use the UID to define ownership, not the name on the account). Moreover it was advised as a best practice measure in the Docker documentation to use an intermediate fileserver. Accordingly this was the way i decided to go with Kubernetes, with a “sidecar” Pod as a fileserver, which mounts the PVs onto the required mount points.

To K8s Or Not To K8s?

When you think about the way that e.g. Minikube is deployed – its a docker container. If you run a docker ps -a, you can see all the mechanics at work. And then if you SSH to the minikube, you can do another docker ps -a, and you see everything to do with Kubernetes pods and containers in the output. This seems like a mess, and if it isn’t, it will do until the mess actually arrives.

Furthermore, you don’t even want to look at the routing tables or network interfaces on the node host. You just cannot unsee that.

There is some considerable complexity here. Further, when you read the documentation for Kubernetes, it does have all the air of documentation written by programmers. We hear a lot about the lack of IT-skilled people, but what is even more lacking, are strategic thinkers (e.g. * [wildcard] Architects) who translate top level business design requirements into programming tactical requirements.

Knowing how Kubernetes works should be enough to know whether it’s really going to be beneficial or not to host your containers there. If you’re not sure you need it, then you probably don’t. In the case of Netdelta, if i have lots and lots of Netdelta sites to manage then i can go with Kubernetes, and now that i have seen Netdelta happily running in Kubernetes with both scheduled celery jobs and manual user-initiated scans, the transition will be a smooth one. In the meantime, I can work with Docker containers alone, with the supporting BASH scripts, whuch are here if you’re interested.

Fintechs and Security – Part One

  • Prologue – covers the overall challenge at a high level
  • Part One – Recruiting and Interviews
  • Part Two – Threat and Vulnerability Management – Application Security
  • Part Three – Threat and Vulnerability Management – Other Layers
  • Part Four – Logging
  • Part Five – Cryptography and Key Management, and Identity Management
  • Part Six – Trust (network controls, such as firewalls and proxies), and Resilience

Recruiting and Interviews

In the prologue of this four-stage process, I set the scene for what may come to pass in my attempt to relate my experiences with fintechs, based on what i am hearing on the street and what i’ve seen myself. In this next instalment, i look at how fintechs are approaching the hiring conundrum when it comes to hiring security specialists, and how, based on typical requirements, things could maybe be improved.

The most common fintech setup is one of public-cloud (AWS, Azure, GCP, etc), They’re developing, or have developed, software for deployment in cloud, with a mobile/web front end. They use devops tools to deploy code, manage and scale (e.g. Kubernetes), collaborate (Git variants) and manage infrastructure (Ansible, Terraform, etc), perhaps they do some SAST. Sometimes they even have different Virtual Private Clouds (VPCs) for different levels of code maturity, one for testing, and one for management. And third party connections with APIs are not uncommon.

Common Pitfalls

  • Fintechs adopt the stance: “we don’t need outside help because we have hipsters. They use acronyms and seem quite confident, and they’re telling me they can handle it”. While not impossible that this can work – its unlikely that a few devops peeps can give a fintech the help they need – this will become apparent later.
  • Using devops staff to interview security engineers. More on this problem later.
  • Testing security engineers with a list of pre-prepared questions. This is unlikely to not end in tears for the fintech. Security is too wide and deep an area for this approach. Fintechs will be rejecting a lot of good candidates by doing this. Just have a chat! For example, ask the candidate their opinions on the usefulness of VA scanners. The length of the response is as important as its technical accuracy. A long response gives an indication of passion for the field.
  • Getting on the security bandwagon too late (such as when you’re already in production!) you are looking at two choices – engage an experienced security hand and ignore their advice, or do not ignore their advice and face downtime, and massive disruption. Most will choose the first option and run the project at massive business risk.

The Security Challenge

Infosec is important, just as checking to see if cars are approaching before crossing the road is important. And the complexity of infosec mandates architecture. Civil engineering projects use architecture. There’s a good reason for that – which doesn’t need elaborating on.

Collapsing buildingWhenever you are trying to build something complex with lots of moving parts, architecture is used to reduce the problem down to a manageable size, and help to build good practices in risk management. The end goal is protective monitoring of an infrastructure that is built with requirements for meeting both risk and compliance challenges.

Because of the complexity of the challenge, it’s good to split the challenge into manageable parts. This doesn’t require talking endlessly about frameworks such as SABSA. But the following six capabilities (people, process, technology) approach is sleek and low-footprint enough for fintechs:

  • Threat and Vulnerability Management (TVM)
  • Logging – not “telemetry” or Threat intelligence, or threat hunting. Just logging. Not even necessarily SIEM.
  • Cryptography and Key Management
  • Identity Management
  • Business Continuity Management
  • Trust (network segmentation, firewalls, proxies).

I will cover these 6 areas in the next two articles, in more detail.

The above mentioned capabilities have an engineering and architecture component and cover very briefly the roles of security engineers and architects. A SABSA based approach without the SABSA theory can work. So an architect takes into account risk (maybe with a threat modelling approach) and compliance goals in a High Level Design (HLD), and generates requirements for the Low Level Design (LLD), which will be compiled by a security engineer. The LLD gives a breakdown of security controls to meet the requirements of the HLD, and how to configure the controls.

Security Engineers and Devops Tools

What happens when a devops peep interviews a security peep? Well – they only have their frame of reference to go by. They will of course ask questions about devops tools. How useful is this approach? Not very. Is this is good test of a security engineer? Based on the security requirements for fintechs, the answer is clear.

Security engineers can use devops tools, and they do, and it doesn’t take a 2 week training course to learn Ansible. There is no great mystery in Kubernetes. If you hire a security engineer with the right background (see the previous post in this series) they will adapt easily. The word on the street is that Terraform config isn’t the greatest mystery in the world and as long as you know Linux, and can understand what the purpose of the tool is (how it fits in, what is the expected result), the time taken to get productive is one day or less.

The point is: if i’m a security engineer and i need to, for example, setup a cloud SIEM collector: some fintechs will use one Infrastructure As Code (IaC) tool, others use another one – one will use Chef, another Ansible, and there are other permutations. Is a lack of familiarity with the tool a barrier to progress? No. So why would you test a security engineer’s suitability for a fintech role by asking questions about e.g. stanzas in Ansible config? You need to ask them questions about the six capabilities I mentioned above – i.e. security questions for a security professional.

Security Engineers and Clouds

Again – what was the transition period from on-premise to cloud? Lets take an example – I know how networking works on-premise. How does it work in cloud? There is this thing called a firewall on-premise. In Azure it’s called a Network Security Group. In AWS its called a …drum roll…firewall. In Google Cloud its called a …firewall. From the web-based portal UI for admin, these appear to filter by source and destination addresses and services, just like an actual non-virtual firewall. They can also filter by service account (GCP), or VM tag.

There is another thing called VPN. And another thing called a Virtual Router. On the world of on-premise, a VPN is a …VPN. A virtual router is a…router. There might be a connection here!

Cloud Service Providers (CSP) in general don’t re-write IT from the ground up. They still use TCP/IP. They host virtual machines (VM) instead of real machines, but as VMs have operating systems which security engineers (with the right background) are familiar with, where is the complication here?

The areas that are quite new compared to anything on-premise are areas where the CSP has provided some technology for a security capability such as SIEM, secrets management, or Identity Management. But these are usually sub-standard for the purpose they were designed for – this is deliberate – the CSPs want to work with Commercial Off The Shelf (COTS) vendors such as Splunk and Qualys, who will provide a IaaS or SaaS solution.

There is also the subject of different clouds. I see some organisations being fussy about this, e.g. a security engineer who worked a lot with Azure but not AWS, is not suitable for a fintech that uses AWS. Apparently. Well, given that the transition from on-premise to cloud was relatively painless, how painful is it to transition from Azure to AWS or …? I was on a project last summer where the fintech used Google Cloud Platform. It was my first date with GCP but I had worked with AWS and Azure before. Was it a problem? No. Do i have an IQ of 160? Hell no!

The Wrap-up

Problems we see in fintech infosec hiring represent what is most likely a lack of understanding of how they can best manage risk with a budget that is considerably less than a large MNC for example. But in security we haven’t been particularly helpful for fintechs – the problem is on us.

The security challenge for fintechs is not just about SAST/DAST of their code. The challenge is wider and be represented as six security capabilities that need to be designed with an architecture and engineering view. This sounds expensive, but its a one-off design process that can be covered in a few weeks. The on-going security challenge, whereby capabilities are pushed through into the final security operations stage, can be realised with one or two security engineers.

The lack of understanding of requirements in security leads to some poor hiring practices, the most common of which is to interview a security engineer with a devops guru. The fintech will be rejecting lots of good security engineers with this approach.

In so many ways, the growth of small to medium development houses has exposed the weaknesses in the infosec sector more than they were ever exposed with large organisations. The lack of the sector’s ability to help fintechs exposes a fundamental lack of skilled personnel, more particularly at the strategic/advisory level than others.

On Hiring For DevSecOps

Based on personal experience, and second hand reports, there’s still some confusion out there that results in lots of wasted time for job seekers, hiring organisations, and recruitment agents.

There is a want or a need to blame recruiters for any hiring difficulties, but we need to stop that. There are some who try to do the right thing but are limited by a lack of any sector experience. Others have been inspired by Wolf Of Wall Street while trying to sound like Simon Cowell.

It’s on the hiring organisation? Well, it is, but let’s take responsibility for the problem as a sector for a change. Infosec likes to shift responsibility and not take ownership of the problem. We blame CEOs, users, vendors, recruiters, dogs, cats, “Russia“, “China” – anyone but ourselves. Could it be we failed as a sector to raise awareness, both internally and externally?

So What Are Common Understandings Of Security Roles?

After 25 years+ we still don’t have universally accepted role descriptions, but at least we can say that some patterns are emerging. Security roles involve looking at risk holistically, and sometimes advising on how to deal with risk:

  • Security Engineers assess risk and design and sometimes also implement controls. BTW some sectors, legal in particular, still struggle with this. Someone who installs security products is in an IT ops role. Someone who upgrades and maintains a firewall is an IT ops role. The fact that a firewall is a security control doesn’t make this a security engineering function.
  • Security Architects take risk and compliance goals into account when they formulate requirements for engineers.
  • Security Analysts are usually level 2 SOC analysts, who make risk assessments in response to an alert or vulnerability, and act accordingly.

This subject evokes as much emotion as CISSP. There are lots of opinions out there. We owe to ourselves to be objective. There are plenty of sources of information on these role definitions.

No Aspect Of Risk Assessment != Security. This is Devops.

If there is no aspect of risk involved with a role, you shouldn’t looking for a security professional. You are looking for DEVOPS peeps. Not security peeps.

If you want a resource to install and configure tools in cloud – that is DEVOPS. It is not Devsecops. It is not Security Engineering or Architecture. It is not Landscape Architecture or Accounting. It is not Professional Dog Walker. it is DEVOPS. And you should hire a DEVOPS person. If you want a resource to install and configure appsec tools for CI/CD – that is DEVOPS. If you want a resource to advise on or address findings from appsec tools, that is a Security Analyst in the first case, DEVSECOPS in the 2nd case. In the 2nd case you can hire a security bod with coding experience – they do exist.

Ok Then So What Does A DevSecOps Beast Look Like?

DevSecOps peeps have an attack mindset from their time served in appsec/pen testing, and are able to take on board the holistic view of risk across multiple technologies. They are also coders, and can easily adapt to and learn multiple different devops tools. This is not a role for newly graduated peeps.

Doing Security With Non-Security Professionals Is At Best Highly Expensive

Another important point: what usually happens because of the skills gap in infosec:

  • Cloud: devops fills the gap.
  • On-premise: Network Engineers fill the gap.

Why doesn’t this work? I’ve met lots of folk who wear the aforementioned badges. Lots of them understand what security controls are for. Lots of them understand what XSS is. But what none of them understand is risk. That only comes from having an attack mindset. The result will be overspend usually – every security control ever conceived by humans will be deployed, while also having an infrastructure that’s full of holes (e.g. default install IDS and WAF is generally fairly useless and comes with a high price tag).

Vulnerability assessment is heavily impacted by not engaging security peeps. Devops peeps can deploy code testing tools and interpret the output. But a lack of a holistic view or an attack mindset, will result in either no response to the vulnerability, or an excessive response. Basically, the Threat And Vulnerability Management capability is broken under these circumstances – a sadly very common scenario.

SIEM/Logging is heavily impacted – what will happen is either nothing (default logging – “we have Stackdriver, we’re ok”), or a SIEM tool will be provisioned which becomes a black hole for events and also budgets. All possible events are configured from every log source. Not so great. No custom use cases will be developed. The capability will cost zillions while also not alerting when something bad is going down.

Identity Management – is not deploying a ForgeRock (please know what you’re getting into with this – its a fork of Sun Microsystems/Oracle’s identity management show) or an Azure AD and that’s it, job done. If you just deploy this with no thought of the problem you’re trying to solve in identity management, you will be fired.

One of the classic risk problems that emerges when no security input is taken: “there is no personally identifiable information in development Virtual Private Clouds, so there is no need for security controls”. Well – intelligence vulnerability such as database schema – attackers love this. And don’t you want your code to be safe and available?

You see a pattern here. It’s all or nothing. Either of which ends up being very expensive or worse. But actually come to think of it, expensive is the goal in some cases. Hold that thought maybe.

A Final Word

So – if the word risk doesn’t appear anywhere in the job description, it is nothing to do with security. You are looking for devops peeps in this case. And – security is an important consideration for cloud migrations.