Windows SIEM – Optimizing Events Volume with CIS Benchmarks and AuditpolCIS

In our 2021 blog post, we focused on identifying quick wins for optimizing Windows Events, and provided a free spreadsheet (really free, not even a regwall) that indicated Windows Events that could be safely ignored, some of which cost lots for SIEM engines to ingest. This post takes a broader Windows Audit Policy view, and offers another free resource – this time taking a broader look in the context of comparing your setup for Windows Audit Policy, and the venerable CIS Benchmark for Windows 2019 Server.

If there’s sufficient interest i’ll follow up with a development effort for a Python tool (also freely available, on Github) that connects to your Windows server and performs the CIS Benchmark assessment as indicated in the spreadsheet.

SIEM Nightmares

Based on many first hand observations and second hand accounts, it’s not a stretch to say that many organisations are suffering from SIEM configuration issues, for which the result is a low signal-to-noise ratio. Your SIEM is ingesting lots of events, many of which are not at all helpful, and with most vendors charging by volume, it gets expensive. At the same time, the false negative problem is all too common. Forensics investigations reveal all too often that there are no events recorded by the expensive SIEM, that even closely relate to the incident. I hope you are never in this scenario. The short-term impact is never good.

Taking SIEM as a capability, if one is to advise on how to improve things, it is rarely ever about the technology. When one asks Analysts (and based on job postings, also hiring managers) about SIEM, it’s clear the first thing that comes to mind is Splunk. ELK, Sentinel, etc. I would estimate the technology-only focus with SIEM to be the norm rather than the exception, and it comes hand-in-hand with a failure to detect privilege elevations, and lateral movements for example.

There are some advisories that we can give out that are independent of your architecture, but many questions about SIEM configuration can only be answered by you, using your knowledge of the IT landscape in your organisation. The advisories in the referenced spreadsheet cover the “noise” part of the signal-to-noise ratio. These are events that are sure to be noise to at least a 90% level of assurance, from a security perspective.

Addtional Context on the Spreadsheet

Some context around the spreadsheet: where there is a CIS Benchmark metric for a specific Audit Subcategory, the spreadsheet follows exactly the CIS recommended setting. But there are some (e.g. DS Access –> Directory Service Access) where this subcategory was not covered by CIS. In these cases, an assessment is made based on our real-experience observations of logging volumes, versus the security (not the IT diagnostic, or other value) value of Audit Subcategories. In this case of the Directory Service Access subcategory, it can be turned off from a security perspective.

There is limited information available regarding actual experiences with specific event ID volumes. In 2018, I had the opportunity to track Windows events in a Splunk architecture for a government department. During this time, I recorded the occurrences of events over a 24-hour period on a network of approximately 150 Windows servers of various versions, some of which were quite exotic. This information has been valuable in supporting decisions related to whether or not to disable auditing.

SIEM Forwarder Filtering

There is another option offered by some SIEM vendors and that is to filter events by Event ID. Overall, the more resource-friendly approach is to prevent the events being generated at source, but in many cases this may not be feasible. Splunk for example allows you to filter at forwarders (via the inputs.conf file on the Splunk forwarder. This file is usually located in the $SPLUNK_HOME/etc/system/local/ directory … more info – BTW it looks like Splunk agrees with us on the 4662 event mentioned as an example above. Yay!).

Credits and Disclaimers

Windows Events are sometimes tricky to understand, both with respect of what the developers intended with those events, and the conditions under which they are generated. Sometimes with Windows Events, we are completely in unknown territory, even if there is some Microsoft documentation that covers them. Here’s one example from Microsoft documentation to fill us with confidence – “This auditing subcategory should not have any events in it, but for some reason Success auditing will enable the generation of event 4985″.

Ultimately only you can decide what’s best for the health of your SOC/SIEM. Only you know your network and your applications. The document supplied here was only intended as a guide, and to aid decision making. It was not intended to make decisions for you.

The cybersecurity landscape often focuses on the more sensational aspects, such as high-profile hacks or fake influencers, which can overshadow the essential work done by countless professionals in the background. These unsung heroes are dedicated to ensuring the stability and security of our digital infrastructure, and their contributions should not be underestimated. Among those are tthe likes of Randy Franklin Smith (founder of Ultmate Windows Security) who has put together an “encyclopedia” of Windows Event IDs. The experiences shared there were used in-part to form a view on whether or not to reject or accept certain Windows Events.

SIEM – Windows Events Quick Win

There has been a modicum of interest in a Windows spreadsheet I shared on social media recently, that if absorbed and acted upon, can be a early no-brainer win with SIEM products that are licensed based on volume or Events Per Second (EPS).

Its no big secret that Windows machines, virtual or real, are noisy. Clients I worked with – I would estimate 90%, for various reasonsdon’t act on the noise from Windows devices and it’s costing them a fortune (right or wrong, approx 50% of those prioritise other tasks).

In Splunk, one can use searches to estimate the benefit of removing noisy Windows events, and what I found was quite a broad range of results. It makes little sense to give the full breakdown because the result depends heavily on the spread and amount of Windows to other Operating Systems (OS). But there were a couple of cases where logging events volume was reduced by 70%.

Some points to note:

  • If the “remove” events are removed, Windows devices become very quiet. Some organisations use events as an indicator of “alive” rather than using active host monitoring. So with this logging configuration, an alternative (more sensible) host monitoring method is needed.
  • Removing these events is highly unlikely to ever result in a failure to detect an attack, but being 100% certain of this is impossible.
  • The most critical aspect of logging isn’t related to these events at all, its about your custom use cases. An example: a usual scenario is for a database listening service to accept application level connections on its listening service port (e.g. 1521 TCP is default for Oracle DB), and the source will be a web or middleware tier. So – configure an alert for when connections come from a source other than the middleware/application tier.
  • Very little actual analysis of Windows events and their purpose is known, or if it is known it is certainly not shared anywhere. There are some historical aspects to many of these events in that they’ve been around for more than 20 years but were never documented particualrly well, apart from here. I have added some insight but not for all events. Hence: if anyone would like any of the contents added or edited, feel free to comment below.
  • The context here is security. For other logging use cases, other events may need to be switched on.
  • The major versions of MS Windows Server that this journal applies to are: 2003, 2008, 2012. Many will apply to both 2016 and 2019.

So here are the links.. note there is no reg or pay wall. You will not be tracked and no data will be held about you. This is a completely free resource for you to collect anonymously:

Kubernetes Migration Case Study

Migrating Netdelta From Docker to Kubernetes

In latish 2020, I moved Netdelta from a Docker deployment to Kubernetes, partly to see what all this Kubernetes jazz is about, and partly to investigate whether it would help me with the management of Netdelta containers for different punters, each of whom has their own docker container and Apache listening service.

I studiously went through the Kubernetes quick tutorial and found i had to investigate the documentation some more. Even then some aspects weren’t covered so well. This post explains what i did to deploy an app into Kubernetes, and some of the gotchas i encountered along the way, that were not covered so well in the Kubernetes documentation, and I summarise with a view of Kubernetes and give my view on: is the hype justified? Will I continue to host Netdelta in Kubernetes?

This is not a Kubernetes tutorial – it does assume some prior exposure on behalf of the reader, but nonethless links to the relevant documentation when some Kubernetes concepts are covered.

Contents

Netdelta in Docker

This post isn’t about Netdelta, but for illustrative purposes: Netdelta aids with the detection of unauthorised changes, and hacker shells, by running one-off port scans, or scheduled jobs, comparing the results with the previous scan, and alerting on changes. This is more chunky than it sounds, mostly because of the analytics that goes into false positives detection. In the Kubernetes implementation, scan results are held in a stateful persistent volume with MySQL.

Netdelta’s docker config can be dug into here, but to summarise the docker setup:

  • Database container – MySQL 5.7
  • Application container – Apache, Django 3.1.4, Celery 5.0.5, Netdelta
  • Fileserver (logs, virtualenv, code deployment)
  • Docker volumes and networking are utilised

Data Flows / Networking

The data flows aspect reflects what is not exactly a bare metal deployment. A Linode-hosted VM running Ubuntu 20 is the host, then the Kubernetes node is minikube, with another node running on a Raspberry pi 3 – the latter aspect not being a production facility. The pi 3 was only to test how well the config would work with load balancing, and Kubernetes Replicasets across nodes.

Reverse Proxy

Ingress connections from the internet are handled first by nginx acting as a reverse proxy. Base URLs for Netdelta are of the form https://www.netdelta.io/<site>. The nginx config …

server {
    listen 80;
    location /barbican {
	proxy_set_header Accept-Encoding "";
	sub_filter_types text/html text/css text/xml;
	sub_filter $host $host/barbican;
        proxy_pass http://local.netdelta.io/barbican;
    }
}

K8s Ingress Controller

This is passing a URL with a first level of <site> to be processed at local.netdelta.io, which is locally resolvable, and is localhost. This is where the nginx Kubernetes Ingress Controller comes into play. The pods in kubernetes have NodePorts configured but these aren’t necessary. The nginx ingress controller takes connections on port 80, and routes based on service names and the defined listening port:

┌──(iantibble㉿bionic)-[~]
└─$ kubectl describe ingress
Name:             netdelta-ingress
Namespace:        default
Address:          172.17.0.2
Default backend:  default-http-backend:80 (<error: endpoints "default-http-backend" not found>)
Rules:
  Host               Path  Backends
  ----               ----  --------
  local.netdelta.io
                     /barbican   netdelta-barbican:9004 (<none>)
Annotations:         <none>
Events:              <none>

The YAML looks thusly:

┌──(iantibble㉿bionic)-[~/netdd/k8s]
└─$ cat ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: netdelta-ingress
spec:
  rules:
    - host: local.netdelta.io
      http:
        paths:
          - path: /barbican
            backend:
              service:
                name: netdelta-barbican
                port:
                  number: 9004
            pathType: Prefix

So the nginx ingress controllers sees the connection forwarded from local.netdelta.io with a URL request of local.netdelta.io/<site>. The requests matches a rule, and forwards to the Kubernetes Service of the same name. The entity that actually answers the call is a docker container masquerading as a Kubernetes Pod, which is part of a deployment. The next step in the data flow is to route the connection to the specified Kubernetes Service which is covered briefly here but in more detail later in the coverage of DNS.

The “service” aspect has the effect of exposing the pod according to the service setup:

┌──(iantibble㉿bionic)-[~/netdd/k8s]
└─$ kubectl get services -o wide
NAME                TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)          AGE    SELECTOR
kubernetes          ClusterIP   10.96.0.1                443/TCP          119d   
mysql-netdelta      ClusterIP   10.97.140.111            3306/TCP         39d    app=mysql-netdelta
netdelta-barbican   NodePort    10.103.160.223           9004:30460/TCP   36d    app=netdelta-barbican
netdelta-xynexis    NodePort    10.102.53.156            9005:31259/TCP   36d    app

DNS

There’s an awful lot of waffle out there about DNS and Kubernetes. Basically – and I know the god of devops won’t let me in heaven for saying this, but making a service in Kubernetes leads to DNS being enabled. DNS in a multi-namespace, multi-node scenario becomes more intreresting of course, and there’s plenty you can configure that’s outside the scope of this article.

Netdelta’s Django settings.py defines a host and database name, and has to be able to find the host:

DATABASES = {
'default': {
'ENGINE': 'django.db.backends.mysql', # Add 'postgresql_psycopg2', 'mysql', 'sqlite3' or 'oracle'.
'NAME': 'netdelta-SITENAME', # Not used with sqlite3.
'USER': 'root', # Not used with sqlite3.
'HOST': mysql-netdelta,
'PASSWORD': 'NOYFB',
'OPTIONS': dict(init_command="SET sql_mode='STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER'"),
}
}

This aspect was poorly documented and was far from obvious: the spec.selector field of the service should match the spec.template.metadata.labels of the pod created by the Deployment.

The Application Hosting in Kubernetes

Referring back to the diagram above, there are pods for each Netdelta site. How was the Docker-hosted version of Netdelta represented in Kubernetes?

The Deployment YAML:

apiVersion: apps/v1 kind: Deployment metadata: creationTimestamp: null labels: app: netdelta-barbican name: netdelta-barbican spec: replicas: 1 selector: matchLabels: app: netdelta-barbican strategy: type: Recreate template: metadata: creationTimestamp: null labels: app: netdelta-barbican spec: containers: - image: registry.netdelta.io/netdelta/barbican:1.0 imagePullPolicy: IfNotPresent name: netdelta-barbican ports: - containerPort: 9004 args: - "barbican" - "9004" - "le" - "certs" resources: {} volumeMounts: - mountPath: /srv/staging name: netdelta-app - mountPath: /srv/logs name: netdelta-logs - mountPath: /le name: le - mountPath: /var/lib/mysql name: data - mountPath: /srv/netdelta_venv name: netdelta-venv imagePullSecrets: - name: regcred volumes: - name: netdelta-app persistentVolumeClaim: claimName: netdelta-app - name: netdelta-logs persistentVolumeClaim: claimName: netdelta-logs - name: le persistentVolumeClaim: claimName: le - name: data persistentVolumeClaim: claimName: data - name: netdelta-venv persistentVolumeClaim: claimName: netdelta-venv restartPolicy: Always serviceAccountName: "" status: {}

Running:

kubectl apply -f netdelta-app-<site>.yaml

Has the effect of creating a pod and a container for the Django application, celery and Apache stack:

┌──(iantibble㉿bionic)-[~] └─$ kubectl get deployments NAME READY UP-TO-DATE AVAILABLE AGE fileserver 1/1 1 1 25d mysql-netdelta 1/1 1 1 25d netdelta-barbican 1/1 1 1 25d ┌──(iantibble㉿bionic)-[~] └─$ kubectl get pods NAME READY STATUS RESTARTS AGE fileserver-6d6bc54f6c-hq8lk 1/1 Running 2 25d mysql-netdelta-5fd7757c66-xqp2j 1/1 Running 2 25d netdelta-barbican-68d78c58bd-vnqdn 1/1 Running 2 25d

K8s Equivakent of Docker Entrypoint Script Parameters

Some other points perhaps worthy of mention were around the Docker v Kubernetes aspects. My docker run command for the netdelta application container was like this:

docker run -it -p 9004:9004 --network netdelta_net --name netdelta_barbican -v netdelta_app:/srv/staging -v netdelta_logs:/srv/logs -v data:/data -v le:/etc/letsencrypt netdelta/barbican:core barbican 9004 le certs

So there’s 4 parameters for the entryscript: site, port, le, and cert. The last two are about letsencrypt certs which won’t be covered here. These are represented in the Kubernetes Deployment YAML in spec.template.spec.containers.args.

Private Image Repository

spec.template.spec.containers.image is set to registry.netdelta.io/netdelta/<site>:<version tag>. Yes, that’s right folks, i’m using a private registry, which is a lot of fun until you realise how hard it is to manage the images there. The setup and management of the private registry won’t be covered here but i found this to be useful.

One other point is about security and encryption in transit for the image pushes and pulls. I’ve been in security for 20 years and have lots of unrestricted penetration testing experience. It shouldn’t be necessary or mandatory to use HTTPS over HTTP in most cases. Admittedly i didn’t spend long trying, but i could not find a way to just use good old clear-text port 80 over 443, which in turn meant i had to configure a SSL certifcate with all the management around it, where the risks are far from justifying such a measure.

PV Mounts

In Dockerland I was using Docker Volumes for persistent storage of logs and application data. I was also using it for the application codebase, and any updates would be sync’d with containers by docker exec wrapped in a BASH script.

There was nothing unexpected in the deployment of the PVCs/PVs, but a couple of points are worth mentioning:

  • PV Filesystem mounts: Netdelta container deployment involves a custom image from COPY (Docker command) of files from a local source to the image. Then the container is run and the application can find the required files. The problem i ran into was about having filesystems mounted over the directories where my application container expected to find files. This meant i had to change my container entryscript to sync with the image when the Pod is deployed, whereas previously the directories were built-out from the docker image build.
  • /tmp as default PV files location: if you SSH to the node (minikube container in my case), you will find the mounted filesystems under /tmp. /tmp is a critical directory for the good health of any Linux-based system and it needs to be 777 (i.e. read and writeable by unauthenticated users and processes) with a sticky bit. This is one that for whatever reason doesn’t find its way into security checklists for Kubernetes but it really does warrant some attention. This can be changed by customising Kubernetes Storage Classes. There’s one pointer here.

Database and Fileserver

The MySQL Database service was deployed as a custom built container with my Docker setup. There was no special reason for this other than to change filesystem permissions, and the fact that the listening service needed to be “exposed” and the database config changed to bind to 0.0.0.0 instead of localhost. What i found with the Kubernetes Pod was that I didn’t need to change the Mysql config at all and spec.ports.targetport had the effect of “exposing” the listening service for the database.

The main reason for using a fileserver in the Dcoker deployment of Netdelta was to act as a container buffer between Docker Volumes and application containers. My my Unix hat on, one is left wondering how filesystem persmissions will work (or otherwise) with file read and writes across network mounted disparate unix systems, where even if the same account names exist on each system, perhaps they have different UIDs (BSD-derived systems use the UID to define ownership, not the name on the account). Moreover it was advised as a best practice measure in the Docker documentation to use an intermediate fileserver. Accordingly this was the way i decided to go with Kubernetes, with a “sidecar” Pod as a fileserver, which mounts the PVs onto the required mount points.

To K8s Or Not To K8s?

When you think about the way that e.g. Minikube is deployed – its a docker container. If you run a docker ps -a, you can see all the mechanics at work. And then if you SSH to the minikube, you can do another docker ps -a, and you see everything to do with Kubernetes pods and containers in the output. This seems like a mess, and if it isn’t, it will do until the mess actually arrives.

Furthermore, you don’t even want to look at the routing tables or network interfaces on the node host. You just cannot unsee that.

There is some considerable complexity here. Further, when you read the documentation for Kubernetes, it does have all the air of documentation written by programmers. We hear a lot about the lack of IT-skilled people, but what is even more lacking, are strategic thinkers (e.g. * [wildcard] Architects) who translate top level business design requirements into programming tactical requirements.

Knowing how Kubernetes works should be enough to know whether it’s really going to be beneficial or not to host your containers there. If you’re not sure you need it, then you probably don’t. In the case of Netdelta, if i have lots and lots of Netdelta sites to manage then i can go with Kubernetes, and now that i have seen Netdelta happily running in Kubernetes with both scheduled celery jobs and manual user-initiated scans, the transition will be a smooth one. In the meantime, I can work with Docker containers alone, with the supporting BASH scripts, whuch are here if you’re interested.

Netdelta – Install and Configure

Netdelta is a tool for monitoring networks and flagging alerts upon changes in advertised services. Now – I like Python and especially Django, and around 2014 or so i was asked to setup a facility for monitoring for changes in that organisation’s perimeter. After some considerable digging, i found nada, as in nothing, apart from a few half-baked student projects. So i went off and coded Netdelta, and the world has never been the same since.

I guess when i started with Netdelta i didn’t see it as a solution that would be widely popular because i was under the impression you could just do some basic shell scripting with nmap, and ndiff is specifically designed for delta flagging. However what became apparent at an early stage was that timeouts are a problem. A delta will be flagged when a host or service times out – and this happens a lot, even on a gigabit LAN, and it happens even more in public clouds. I built some analytics into Netdelta that looks back over the scan history and data and makes a call on the likelihood of a false positive (red, amber green).

Most organisations i worked with would benefit from this. One classic example i can think of – a trading house that had been on an aggressive M&A spree, maybe it was Black Friday or…? Anyway – they fired some network engineers and hired some new and cheaper ones, exacerbating what was already a poorly managed perimeter scenario. CISO wanted to know what was going off with these Internet facing subnets – enter Netdelta. Unauthorised changes are a problem! I am directly aware of no fewer than 6 incidents that occurred as a result of exposed SSH, SMB (Wannacry), and more recently RDP, and indirectly aware of many more.

Anyway without further waffle, here’s how you get Netdelta up and running. Warning – there are a few moving parts, but if someone wants it in Docker, let me know.

I always go with Ubuntu. The differences between Linux distros are like the differences between mueslis. My build was on 18.04 but its highly likely 19 variants will be just fine.

apt-get update
apt-get install curl nmap apache2 python3 python3-pip python3-venv rabbitmq-server mysql-server libapache2-mod-wsgi-py3 git
apt-get -y install software-properties-common
add-apt-repository ppa:certbot/certbot
apt-get install -y python-certbot-apache
Clone the repository from github into your <netdelta root> 
git clone https://github.com/SevenStones/netdelta.git

Filesystem

Create the user that will own <netdelta root>

useradd -s /bin/bash -d /home/<user> -m <user>

Create the directory that will host the Netdelta Django project if necessary

Add the user to a suitable group, and strip world permissions from netdelta directories

 groupadd <group> 
 usermod -G  <group> <user>
 usermod -G  <group> www-data
 chown -R www-data:<group> /var/www
 chown -R <user>:<group> <netdelta root> 

Make the logs dir, e.g. /logs, and you will need to modify /nd/netdelta_logger.py to point to this location. Note the celery monitor logs go to /var/log/celery/celery-monitor.log …which of course you can change.

Strip world permissions from all netdelta and apache root dirs:

chmod -Rv o-rwx <web root>
chmod -Rv o-rwx <netdelta root>

Virtualenv

The required packages are in requirements.txt, in the root of the git repo. Your virtualenv build with Python 3 goes approximately like this …

python3 -m venv /path/to/new/virtual/environment

You activate thusly: source </path/to/new/virtual/environment/>/bin/activate

Then suck in the requirements as root, remembering to fix permissions after you do this.

pip3 install wheel
pip3 install -r <netdelta root>/requirements.txt

You can use whatever supported database you like. MySQL is assumed here.

The Python framework mysqlclient was used with earlier versions of Django and MySQL. but with Python 3 and later Django versions, the word on the street is PyMySQL is the way to go. With this though, it took some trickery to get the Django project up and running; in the form of init.py for the project (<netdelta root>/netdelta/.init.py) and adding a few lines …

import pymysql 
pymysql.install_as_MySQLdb()

While in virtualenv, and under your netdelta root, add a superuser for the DF

Patch libnmap

Two main mods to the libnmap in usage with Netdelta were necessary. First, with later versions of Celery (>3.1), there was a security issue with “deamonic processes are not allowed to have children”, for which an alternative fork of libnmap fixed the problem. Then we needed to return to Netdelta the process id of the running nmap port scanner process.

cd /opt
git clone https://github.com/pyoner/python-libnmap.git
cp ./python-libnmap/libnmap/process.py <virtualenv root>/lib/python3.x/site-packages/libnmap/

then patch libnmap to allow Netdelta to kill scanning processes

<netdelta_root>/scripts/fix-libnmap.bash

Change the environment variables to match your install and use the virtualenv name as a parameter

Database Setup

Create a database called netdetla and use whichever encoding snd collation you like.

CREATE DATABASE netdelta CHARACTER SET utf8 COLLATE utf8_general_ci;

Then from <netdelta root> with virtualenv engaged:

python manage.py makemigrations nd
python manage.py migrate

Web Server

I am assuming all you good security pros don’t want to use the development server? Well as you’re only dealing with port scan data then….your call. I’m assuming Apache as a production web server.

You will need to give Apache a stub web root and enable the wsgi module. For the latter i added this to apache2.conf – this gives you some control over the exact version of Python loaded.

LoadModule wsgi_module "/usr/lib/python3.7/site-packages/mod_wsgi/server/mod_wsgi-py37.cpython-37m-i386-linux-gnu.so"
WSGIPythonHome "/usr"

Under the DocumentRoot line in your apache config file, give the pointers for WSGI.

WSGIDaemonProcess <site> python-home=<virtualenv root> python-path=<netdelta root>  WSGIProcessGroup <site>  WSGIScriptAlias / <netdelta root>/netdelta/wsgi.py  Alias /static/ <netdelta root>/netdelta/

Note also you will need to adjust your wsgi.py under <netdelta root>/netdelta/ –

# Add the site-packages of the chosen virtualenv to work with site.addsitedir('<virtualenv root>/lib/python3.7/site-packages')

Celery

From the current shell
in …<virtualenv> ….under <netdelta root>

celery worker -E -A nd -n default -Q default --loglevel=info -B --logfile=<netdelta root>/logs/celery.log

Under systemd (you will almost certainly want to do this) with the root user. The script pointed to by systemd for

systemctl start celery

can have all the environment checking (this isn’t intended to be a tutorial in BASH scripting), but the core of it…

cd <netdelta root>
nohup $VIRTUALENV_DIR/bin/celery worker -E -A nd -n ${SITE} -Q ${SITE} --loglevel=info -B --logfile=${SITE_LOGS}/celery.log >/dev/null 2>&1 &

And then you can put together your own scripts for status, stop, restart.

Fintechs and Security – Part Three

  • Prologue – covers the overall challenge at a high level
  • Part One – Recruiting and Interviews
  • Part Two – Threat and Vulnerability Management – Application Security
  • Part Three – Threat and Vulnerability Management – Other Layers
  • Part Four – Logging
  • Part Five – Cryptography and Key Management, and Identity Management
  • Part Six – Trust (network controls, such as firewalls and proxies), and Resilience
Threat and Vulnerability Management (TVM) – Other Layers

This article covers the key principles of vulnerability management for cloud, devops, and devsecops, and herein addresses the challenges faced by fintechs.

The previous post covered TVM from the application security point of view, but what about everything else? Being cloud and “dynamic”, even with Kubernetes and the mythical Immutable Architecture, doesn’t mean you don’t have to worry about the security of the operating systems and many devices in your cloud. The devil loves to hear claims to the effect that devops never SSHs to VM instances. And does SaaS help? Well that depends if SaaS is a good move – more on that later.

Fintechs are focussing on application security, which is good, but not so much in the security of other areas such as containers, IaaS/SaaS VMs, and little thought is ever given to the supply of patches and container images (they need to come from an integral source – preferably not involving pulling from the public Internet, and the patches and images need to be checked for integrity themselves).

And in general with vulnerability assessment (VA), we in infosec are still battling a popular misconception, which after a quarter of a decade is still a popular misconception – and that is the value, or lack of, of unauthenticated scanners such as OpenVAS and Nessus. More on this later.

The Overall Approach

The design process for a TVM capability was covered in Part One. Capabilities are people, process, and technology. They’re not just technology. So the design of TVM is not as follows: stick an OpenVAS VM in a VPC, fill it with target addresses, send the auto-generated report to ops. That is actually how many fintechs see the TVM challenge, or they just see it as being a purely application security show.

So there is a vulnerability reported. Is it a false positive? If not, then what is the risk? And how should the risk be treated? In order to get a view of risk, security professionals with an attack mindset need to know

  • the network layout and data flows – think from the point of view of an attacker – so for example if a front end web micro-service is compromised, what can the attacker can do from there? Can they install recon tools such as a port scanner or sniffer locally and figure out where the back end database is? This is really about “trust relationships”. That widget that routes connections may in itself seem like a device that isn’t worthy of attention, but it routes connections to a database hosting crown jewels…you can see its an important device and its configuration needs some intense scrutiny.
  • the location and sensitivity of critical information assets.
  • The ease and result of an exploit – how easy is it to gain a local shell presence and then what is the impact?

The points above should ideally be covered as part of threat modelling, that is carried out before any TVM capability design is drafted.

if the engineer or analyst or architect has the experience in CTF or simulated attack, they are in a good position to speak confidently about risk.

Types of Tool

I covered appsec tools in part two.

There are two types: unauthenticated and credentialed or authenticated scanners.

Many years ago i was an analyst running VA scans as part of an APAC regional accreditation service. I was using Nessus mostly but some other tools also. To help me filter false positives, I set up a local test box with services like Apache, Sendmail, etc, pointed Nessus at the box, then used Ethereal (now Wireshark) to figure out what the scanner was actually doing.

What became abundantly obvious with most services, is that the scanner wasn’t actually doing anything. It grabs a service banner and then …nothing. tumbleweed

I thought initially there was a problem with my setup but soon eliminated that doubt. There are a few cases where the scanner probes for more information but those automated efforts are somewhat ineffectual and in many cases the test that is run, and then the processing of the result, show a lack of understanding of the vulnerability. A false negative is likely to result, or at best a false positive. The scanner sees a text banner response such as “apache 2.2.14”, looks in its database for public disclosed vulnerability for that version, then barfs it all out as CRITICAL, red colour, etc.

Trying to assess vulnerability of an IaaS VM with unauthenticated VA scanners is like trying to diagnose a problem with your car without ever lifting the hood/bonnet.

So this leads us to credentialed scanners. Unfortunately the main players in the VA space pander to unauthenticated scans. I am not going to name vendors here, but its clear the market is poorly served in the area of credentialed scanning.

It’s really very likely that sooner rather than later, accreditation schemes will mandate credentialed scanning. It is slowly but surely becoming a widespread realisation that unauthenticated scanners are limited to the above-mentioned testing methodology.

So overall, you will have a set of Technical Security Standards for different technologies such as Linux, Cisco IoS, Docker, and some others. There are a variety of tools out there that will get part of the job done with the more popular operating systems and databases. But in order to check compliance to your Technical Security Standards, expect to have to bridge the gap with your own scripting. With SSH this is infinitely feasible. With Windows, it is harder, but check Ansible and how it connects to Windows with Python.

Asset Management

Before you can assess for vulnerability, you need to know what your targets are. Thankfully Cloud comes with fewer technical barriers here. Of course the same political barriers exist as in the on-premise case, but the on-premise case presents many technical barriers in larger organisations.

Google Cloud has a built-in feature, and with AWS, each AWS Service (eg Amazon EC2, Amazon S3) have their own set of API calls and each Region is independent. AWS Config is highly useful here.

SaaS

I covered this issue in more detail in a previous post.

Remember the old times of on-premise? Admins were quite busy managing patches and other aspects of operating systems. There are not too many cases where a server is never accessed by an admin for more than a few weeks. There were incompatibilities and patch installs often came with some banana skins around dependencies.

The idea with SaaS is you hand over your operating systems to the CSP and hope for the best. So no access to SMB, RDP, or SSH. You have no visibility of patches that were installed, or not (!), and you have no idea which OS services are enabled or not. If you ask your friendly CSP for more information here, you will not get a reply, and if you do they will remind you that handed over your 50-million-lines-of-source-code OSes to them.

Here’s an example – one variant of the Conficker virus used the Windows ‘at’ scheduling service to keep itself prevalent. Now cloud providers don’t know if their customers need this or not. So – they verge on the side of danger and assume that they do. They will leave it enabled to start at VM boot up.

Note that also – SaaS instances will be invisible to credentialed VA scanners. The tool won’t be able to connect to SSH/RDP.

I am not suggesting for a moment that SaaS is bad. The cost benefits are clear. But when you moved to cloud, you saved on managing physical data centers. Perhaps consider that also saving on management of operating systems maybe taking it too far.

Patching

Don’t forget patching and look at how you are collecting and distributing patches. I’ve seen some architectures where the patching aspect is the attack vector that presents the highest danger, and there have been cases where malicious code was introduced as a result of poor patching.

The patches need to come from an integral source – this is where DNSSEC can play a part but be aware of its limitations – e.g. update.microsoft.com does not present a ‘dnskey’ Resource Record. Vendors sometimes provide a checksum or PGP cryptogram.

Some vendors do not present any patch integrity checksums at all and will force users to download a tarball. This is far from ideal and a workaround will be critical in most cases.

Redhat has their Satellite Network which will meet most organisations’ requirements.

For cloud, the best approach will usually be to ingress patches to a management VPC/Vnet, and all instances (usually even across differing code maturity level VPCs), can pull from there.

Delta Testing

Doing something like scanning critical networks for changes in advertised listening services is definitely a good idea, if not for detecting hacker shells, then for picking up on unauthorised changes. There is no feasible means to do this manually with nmap, or any other port scanner – the problem is time-outs will be flagged as a delta. Commercial offerings are cheap and allow tracking over long histories, there’s no false positives, and allow you to create your own groups of addresses.

Penetration Testing

There’s ideal state, which for most orgs is going to be something like mature vulnerability management processes (this is vulnerability assessment –> deduce risk with vulnerability –> treat risk –> repeat), and the red team pen test looks for anything you may have missed. Ideally, internal sec teams need to know pretty much everything about their network – every nook and cranny, every switch and firewall config, and then the pen test perhaps tells them things they didn’t already know.

Without these VM processes, you can still pen test but the test will be something like this: you find 40 holes of the 1000 in the sieve. But it’s worse than that, because those 40 holes will be back in 2 years.

There can be other circumstances where the pen test by independent 3rd party makes sense:

  • Compliance requirement.
  • Its better than nothing at all. i.e. you’re not even doing VA scans, let alone credentialed scans.

Wrap-up

  • It’s far from all about application security. This area was covered in part two.
  • Design a TVM capability (people, process, technology), don’t just acquire a technology (Qualys, Rapid 7, Tenable SC. etc), fill it with targets, and that’s it.
  • Use your VA data to formulate risk, then decide how to treat the risk. Repeat. Note that CVSS ratings are not particularly useful here. You need to ascertain risk for your environment, not some theoretical environment.
  • Credentialed scanning is the only solution worth considering, and indeed it’s highly likely that compliance schemes will soon start to mandate credentialed scanning.
  • Use a network delta tester to pick up on hacker shells and unauthorised changes in network services and firewalls.
  • Being dynamic with Kubernetes and microservices has not yet killed your platform risk or the OS in general.
  • SaaS may be a step too far for many, in terms of how much you can outsource.
  • When you SaaS’ify a service, you hand over the OS to a CSP, and also remove it from the scope of your TVM VA credentialed scanning.
  • Penetration testing has a well-defined place in security, which isn’t supposed to be one where it is used to inform security teams about their network! Think compliance, and what ideal state looks like here.

Fintechs and Security – Part Two

  • Prologue – covers the overall challenge at a high level
  • Part One – Recruiting and Interviews
  • Part Two – Threat and Vulnerability Management – Application Security
  • Part Three – Threat and Vulnerability Management – Other Layers
  • Part Four – Logging
  • Part Five – Cryptography and Key Management, and Identity Management
  • Part Six – Trust (network controls, such as firewalls and proxies), and Resilience

Threat and Vulnerability Management (TVM) – Application Security

This part covers some high-level guider points related to the design of the application security side of TVM (Threat and Vulnerability Management), and the more common pitfalls that plague lots of organisations, not just fintechs. I won’t be covering different tools in the SAST or DAST space apart from one known-good. There are some decent SAST tools out there but none really stand out. The market is ever-changing. When i ask vendors to explain what they mean by [new acronym] what usually results is nothing, or a blast of obfuscation. So I’m not here to talk about specific vendor offerings, especially as the SAST challenge is so hard to get even close to right.

With vulnerability management in general, ${VENDOR} has succeeded in fouling the waters by claiming to be able to automate vulnerability management. This is nonsense. Vulnerability assessment can to some limited degree be automated with decent results, but vulnerability management cannot be automated.

The vulnerability management cycle has also been made more complicated by GRC folk who will present a diagram representing a cycle with 100 steps, when really its just assess –> deduce risk –> treat risk –> GOTO 1. The process is endless, and in the beginning it will be painful, but if handled without redundant theory, acronyms-for-the-sake-of-acronyms-for-the-same-concept-that-already-has-lots-of-acronyms, rebadging older concepts with a new name to make them seem revolutionary, or other common obfuscation techniques, it can be easily integrated as an operational process fairly quickly.

The Dawn Of Application Security

If you go back to the heady days of the late 90s, application security was a thing, it just wasn’t called “application security”. It was called penetration testing. Around the early 2000s, firewall configurations improved to the extent that in a pen test, you would only “see” port 80 and/or 443 exposing a web service on Apache, Internet Information Server, or iPlanet (those were the days – buffer overflow nirvana). So with other attack channels being closed from the perimeter perspective, more scrutiny was given to web-based services.

Attackers realised you can subvert user input by intercepting it with a proxy, modifying some fields, perhaps inject some SQL or HTML, and see output that perhaps you wouldn’t expect to see as part of the business goals of the online service.

At this point the “application security” world was formed and vulnerabilities were classified and given new names. The OWASP Top Ten was born, and the world has never been the same since.

SAST/DAST

More acronyms have been invented by ${VENDOR} since the early early pre-holocene days of appsec, supposedly representing “brand new” concepts such as SAST (Static Application Security Testing) and DAST (Dynamic Application Security Testing), which is the new equivalent of white box and black box testing respectively. The basic difference is about access to the source code. SAST is source code testing while DAST is an approach that will involve testing for OWASP type vulnerabilities while the software is running and accepting client connection requests.

The SAST scene is one that has been adopted by fintechs in more recent times. If you go back 15 years, you would struggle to find any real commercial interest in doing SAST – so if anyone ever tells you they have “20” or even “10” years of SAST experience, suggest they improve their creativity skills. The general feeling, not unjustified, was that for a large, complex application, assessing thousands of lines of source code at a vital organ/day couldn’t be justified.

SAST is more of a common requirement these days. Why is that? The rise of fintechs, whose business is solely about generation of applications, is one side of it, and fintechs can (and do) go bust if they suffer a breach. Also – ${VENDOR}s have responded to the changing Appsec landscape by offering “solutions”. To be fair, the offerings ARE better than 10 years ago, but it wouldn’t take much to better those Hello World scripts. No but seriously, SAST assessment tools are raved about by Gartner and other independent sources, and they ARE better than offerings from the Victorian era, but only in certain refined scenarios and with certain programming languages.

If it was possible to be able to comprehensively assess lots of source code for vulnerability and get accurate results, then theoretically DAST would be harder to justify as a business undertaking. But as it is, SAST + DAST, despite the extensive resources required to do this effectively, can be justified in some cases. In other cases it can be perfectly fine to just go with DAST. It’s unlikely ever going to be ok to just go with SAST because of the scale of the task with complex apps.

Another point here – i see some fintechs using more than one SAST tool, from different vendors. There’s usually not much to gain from this. Some tools are better with some programming languages than others, but there is nothing cast in stone or any kind of majority-view here. The costs of going with multiple vendors is likely going to be harder and harder to justify as time goes on.

Does Automated Vulnerability Assessment Help?

The problem of appsec is still too complex for decent returns from automation. Anyone who has ever done any manual testing for issues such as XSS knows the vast myriad of ways in which such issues can be manifested. The blackbox/blind/DAST scene is still not more than Burp, Dirbuster, but even then its mostly still manual testing with proxies. Don’t expect to cover all OWASP top 10 issues for a complex application that presents an admin plus a user interface, even in a two-week engagement with four analysts.

My preferred approach is still Fred Flinstone’y, but since the automation just isn’t there yet, maybe its the best approach? This needs to happen when an application is still in the conceptual white board architecture design phase, not a fully grown [insert Hipster-given-name], and it goes something like this: white board, application architect – zero in on the areas where data flows involve transactions with untrusted networks or users. Crpyto/key management is another area to zoom in on.

Web Application Firewall

The best thing about WAFs, is they only allow propagation of the most dangerous attacks. But seriously, WAF can help, and in some respects, given the above-mentioned challenges of automating code testing, you need all the help you can get, but you need to spend time teaching the WAF about your expected URL patterns and tuning it – this can be costly. A “dumb” default-configured WAF can probably catch drive-by type issues for public disclosed vulnerabilities as long as you keep it updated. A lot depends on your risk profile, but note that you don’t need a security engineer to install a WAF and leave it in default config. Pretty much anyone can do this. You _do_ need an experienced security engineer or two to properly understand an application and configure a WAF accordingly.

Python and Ruby – Web Application Frameworks

Web application frameworks such as Ruby on Rails (RoR) and Django are in common usage in fintechs, and are at least in some cases, developed with security in mind in that they do offer developers features that are on by default. For example, with Django, if you design a HTML form for user input, the server side will have been automagically configured with the validation on the server side, depending on the model field type. So an email address will be validated client and server-side as an email address. Most OWASP issues are the result of failures to validate user input on the server side.

Note also though that with Django you can still disable HTML tag filtering of user input with a “| safe” in the template. So it’s dangerous to assume that all user input is sanitised.

In Django Templates you will also see a CSRF token as a hidden form field if you include a Form object in your template.

The point here is – the root of all evil in appsec is server-side validation, and much of your server-side validation effort in development will be covered by default if you go with RoR or Django. That is not the end of the story though with appsec and Django/RoR apps. Vulnerability of the host OS and applications can be problematic, and it’s far from the case that use of either Django or RoR as a dev framework eliminates the need for DAST/SAST. However the effort will be significantly reduced as compared to the C/Java/PHP cases.

Wrap-up

Overall i don’t want to too take much time bleating about this topic because the take away is clear – you CAN take steps to address application security assessment automation and include the testing as part of your CI/CD pipeline, but don’t expect to catch all vulnerabilities or even half of what is likely an endless list.

Expect that you will be compromised and plan for it – this is cheaper than spending zillions (e.g. by going with multiple SAST tools as i’ve seen plenty of times) on solving an unsolvable problem – just don’t let an incident result in a costly breach. This is the purpose of security architecture and engineering. It’s more to deal with the consequences of an initial exploit of an application security fail, than to eliminate vulnerability.

Fintechs and Security – Part One

  • Prologue – covers the overall challenge at a high level
  • Part One – Recruiting and Interviews
  • Part Two – Threat and Vulnerability Management – Application Security
  • Part Three – Threat and Vulnerability Management – Other Layers
  • Part Four – Logging
  • Part Five – Cryptography and Key Management, and Identity Management
  • Part Six – Trust (network controls, such as firewalls and proxies), and Resilience

Recruiting and Interviews

In the prologue of this four-stage process, I set the scene for what may come to pass in my attempt to relate my experiences with fintechs, based on what i am hearing on the street and what i’ve seen myself. In this next instalment, i look at how fintechs are approaching the hiring conundrum when it comes to hiring security specialists, and how, based on typical requirements, things could maybe be improved.

The most common fintech setup is one of public-cloud (AWS, Azure, GCP, etc), They’re developing, or have developed, software for deployment in cloud, with a mobile/web front end. They use devops tools to deploy code, manage and scale (e.g. Kubernetes), collaborate (Git variants) and manage infrastructure (Ansible, Terraform, etc), perhaps they do some SAST. Sometimes they even have different Virtual Private Clouds (VPCs) for different levels of code maturity, one for testing, and one for management. And third party connections with APIs are not uncommon.

Common Pitfalls

  • Fintechs adopt the stance: “we don’t need outside help because we have hipsters. They use acronyms and seem quite confident, and they’re telling me they can handle it”. While not impossible that this can work – its unlikely that a few devops peeps can give a fintech the help they need – this will become apparent later.
  • Using devops staff to interview security engineers. More on this problem later.
  • Testing security engineers with a list of pre-prepared questions. This is unlikely to not end in tears for the fintech. Security is too wide and deep an area for this approach. Fintechs will be rejecting a lot of good candidates by doing this. Just have a chat! For example, ask the candidate their opinions on the usefulness of VA scanners. The length of the response is as important as its technical accuracy. A long response gives an indication of passion for the field.
  • Getting on the security bandwagon too late (such as when you’re already in production!) you are looking at two choices – engage an experienced security hand and ignore their advice, or do not ignore their advice and face downtime, and massive disruption. Most will choose the first option and run the project at massive business risk.

The Security Challenge

Infosec is important, just as checking to see if cars are approaching before crossing the road is important. And the complexity of infosec mandates architecture. Civil engineering projects use architecture. There’s a good reason for that – which doesn’t need elaborating on.

Collapsing buildingWhenever you are trying to build something complex with lots of moving parts, architecture is used to reduce the problem down to a manageable size, and help to build good practices in risk management. The end goal is protective monitoring of an infrastructure that is built with requirements for meeting both risk and compliance challenges.

Because of the complexity of the challenge, it’s good to split the challenge into manageable parts. This doesn’t require talking endlessly about frameworks such as SABSA. But the following six capabilities (people, process, technology) approach is sleek and low-footprint enough for fintechs:

  • Threat and Vulnerability Management (TVM)
  • Logging – not “telemetry” or Threat intelligence, or threat hunting. Just logging. Not even necessarily SIEM.
  • Cryptography and Key Management
  • Identity Management
  • Business Continuity Management
  • Trust (network segmentation, firewalls, proxies).

I will cover these 6 areas in the next two articles, in more detail.

The above mentioned capabilities have an engineering and architecture component and cover very briefly the roles of security engineers and architects. A SABSA based approach without the SABSA theory can work. So an architect takes into account risk (maybe with a threat modelling approach) and compliance goals in a High Level Design (HLD), and generates requirements for the Low Level Design (LLD), which will be compiled by a security engineer. The LLD gives a breakdown of security controls to meet the requirements of the HLD, and how to configure the controls.

Security Engineers and Devops Tools

What happens when a devops peep interviews a security peep? Well – they only have their frame of reference to go by. They will of course ask questions about devops tools. How useful is this approach? Not very. Is this is good test of a security engineer? Based on the security requirements for fintechs, the answer is clear.

Security engineers can use devops tools, and they do, and it doesn’t take a 2 week training course to learn Ansible. There is no great mystery in Kubernetes. If you hire a security engineer with the right background (see the previous post in this series) they will adapt easily. The word on the street is that Terraform config isn’t the greatest mystery in the world and as long as you know Linux, and can understand what the purpose of the tool is (how it fits in, what is the expected result), the time taken to get productive is one day or less.

The point is: if i’m a security engineer and i need to, for example, setup a cloud SIEM collector: some fintechs will use one Infrastructure As Code (IaC) tool, others use another one – one will use Chef, another Ansible, and there are other permutations. Is a lack of familiarity with the tool a barrier to progress? No. So why would you test a security engineer’s suitability for a fintech role by asking questions about e.g. stanzas in Ansible config? You need to ask them questions about the six capabilities I mentioned above – i.e. security questions for a security professional.

Security Engineers and Clouds

Again – what was the transition period from on-premise to cloud? Lets take an example – I know how networking works on-premise. How does it work in cloud? There is this thing called a firewall on-premise. In Azure it’s called a Network Security Group. In AWS its called a …drum roll…firewall. In Google Cloud its called a …firewall. From the web-based portal UI for admin, these appear to filter by source and destination addresses and services, just like an actual non-virtual firewall. They can also filter by service account (GCP), or VM tag.

There is another thing called VPN. And another thing called a Virtual Router. On the world of on-premise, a VPN is a …VPN. A virtual router is a…router. There might be a connection here!

Cloud Service Providers (CSP) in general don’t re-write IT from the ground up. They still use TCP/IP. They host virtual machines (VM) instead of real machines, but as VMs have operating systems which security engineers (with the right background) are familiar with, where is the complication here?

The areas that are quite new compared to anything on-premise are areas where the CSP has provided some technology for a security capability such as SIEM, secrets management, or Identity Management. But these are usually sub-standard for the purpose they were designed for – this is deliberate – the CSPs want to work with Commercial Off The Shelf (COTS) vendors such as Splunk and Qualys, who will provide a IaaS or SaaS solution.

There is also the subject of different clouds. I see some organisations being fussy about this, e.g. a security engineer who worked a lot with Azure but not AWS, is not suitable for a fintech that uses AWS. Apparently. Well, given that the transition from on-premise to cloud was relatively painless, how painful is it to transition from Azure to AWS or …? I was on a project last summer where the fintech used Google Cloud Platform. It was my first date with GCP but I had worked with AWS and Azure before. Was it a problem? No. Do i have an IQ of 160? Hell no!

The Wrap-up

Problems we see in fintech infosec hiring represent what is most likely a lack of understanding of how they can best manage risk with a budget that is considerably less than a large MNC for example. But in security we haven’t been particularly helpful for fintechs – the problem is on us.

The security challenge for fintechs is not just about SAST/DAST of their code. The challenge is wider and be represented as six security capabilities that need to be designed with an architecture and engineering view. This sounds expensive, but its a one-off design process that can be covered in a few weeks. The on-going security challenge, whereby capabilities are pushed through into the final security operations stage, can be realised with one or two security engineers.

The lack of understanding of requirements in security leads to some poor hiring practices, the most common of which is to interview a security engineer with a devops guru. The fintech will be rejecting lots of good security engineers with this approach.

In so many ways, the growth of small to medium development houses has exposed the weaknesses in the infosec sector more than they were ever exposed with large organisations. The lack of the sector’s ability to help fintechs exposes a fundamental lack of skilled personnel, more particularly at the strategic/advisory level than others.

Fintechs and Security – Prologue

  • Prologue – covers the overall challenge at a high level
  • Part One – Recruiting and Interviews
  • Part Two – Threat and Vulnerability Management – Application Security
  • Part Three – Threat and Vulnerability Management – Other Layers
  • Part Four – Logging
  • Part Five – Cryptography and Key Management, and Identity Management
  • Part Six – Trust (network controls, such as firewalls and proxies), and Resilience

Fintechs and Security – A Match Made In Heaven?

Well, no. Far from it actually. But again, as i’ve been repeating for 20 years now, its not on the fintechs. It’s on us in infosec, and infosec has to take responsibility for these problems in order to change. If i’m a CTO of a fintech, I would be confused at the array of opinions and advice which vary radically from one expert to another

But there shouldn’t be such confusion with fintech challenges. Confusion only reigns where there’s FUD. FUD manifests itself in the form of over-lengthy coverage and excessive focus on “controls” (the archetypal shopping list of controls to be applied regardless of risk – expensive), GRC, and “hacking/”[red,blue,purple,yellow,magenta/teal/slate grey] team”/”appsec.

Really what’s needed is something like this (in order):

  • Threat modelling lite – a one off, reviewed periodically.
  • Architecture lite – a one off, review periodically.
  • Engineering lite – a one off, review periodically.
  • Secops lite – the result of the previous 3 – an on-going protective monitoring capability, the first level of monitoring and response for which can be outsourced to a Managed Service Provider.

I will cover these areas in more details in later episodes but what’s needed is, for example, a security design that only provides the answer to “What is the problem? How are we going to solve it?” – so a SIEM capability design for example – not more than 20 pages. No theory. Not even any justifications. And one that can be consumed by non-security folk (i.e. it’s written in the language of business and IT).

Fintechs and SMBs – How Is The Infosec Challenge Unique?

With a lower budget, there is less room for error. Poor security advice can co-exist with business almost seamlessly in the case of larger organisations. Not so with fintechs and Small and Medium Businesses (SMBs). There has been cases of SMBs going under as a result of a security incident, whereas larger businesses don’t even see a hit on their share price.

Look For A Generalist – They Do Exist!

The term “generalist” is seen as a four-letter word in some infosec circles. But it is possible for one or two generalists to cover the needs of a fintech at green-field, and then going forward into operations, its not unrealistic to work with one in-house security engineer of the right background, the key ingredients of which are:

  • Spent at least 5 years in IT, in a complex production environment, and outgrew the role.
  • Has flexibility – the old example still applies today – a Unix fan has tinkered with Windows. So i.e. a technology lover. One who has shown interest in networking even though they’re not a network engineer by trade. Or one who sought to improve efficiency by automating a task with shell scripting.
  • Has an attack mindset – without this, how can they evaluate risk or confidently justify a safeguard?

I have seen some crazy specialisations in larger organisations e.g. “Websense Security Engineer”! If fintechs approached security staffing in the same way as larger organisations, they would have more security staff than developers which is of course ridiculous.

So What’s Next?

In “On Hiring For DevSecOps” I covered some common pitfalls in hiring and explained the role of a security engineer and architect.

There are “fallback” or “retreat” positions in larger organisations and fintechs alike, wherein executive decisions are made to reduce the effort down to a less-than-advisable position:

  • Larger organisations: compliance driven strategy as opposed to risk based strategy. Because of a lack of trustworthy security input, execs end up saying “OK i give up, what’s the bottom line of what’s absolutely needed?”
  • Fintechs: Application security. The connection is made with application development and application security – which is quite valid but the challenge is wider. Again, the only blame i would attribute here is with infosec. Having said that, i noticed this year that “threat modelling” has started to creep into job descriptions for Security Engineers.

So for later episodes – of course the areas to cover in security are wider than appsec, but again there is no great complication or drama or arm-waiving:

  • Part One – Hiring and Interviews – I expand on “On Hiring For DevSecOps“. I noticed some disturbing trends in 2019 and i cover these in some more detail.
  • Part Two – Security Architecture and Engineering I – Threat and Vulnerability Management (TVM)
  • Part Three – Security Architecture and Engineering II – Logging (not necessarily SIEM). No Threat Hunting, Telemetry, or Threat “Intelligence”. No. Just logging. This is as sexy as it needs to be. Any more sexy than this should be illegal.
  • Part Four – Security Architecture and Engineering III – Identity Management (IDAM) and Cryptography and Key Management (CKM).
  • Part Five – Security Architecture and Engineering IV – Trust (network trust boundary controls – e.g. firewalls and forward proxies), and Business Resilience Management (BRM).

I will try and get the first episode on hiring and interviewing out before 2020 hits us but i can’t make any promises!

On Hiring For DevSecOps

Based on personal experience, and second hand reports, there’s still some confusion out there that results in lots of wasted time for job seekers, hiring organisations, and recruitment agents.

There is a want or a need to blame recruiters for any hiring difficulties, but we need to stop that. There are some who try to do the right thing but are limited by a lack of any sector experience. Others have been inspired by Wolf Of Wall Street while trying to sound like Simon Cowell.

It’s on the hiring organisation? Well, it is, but let’s take responsibility for the problem as a sector for a change. Infosec likes to shift responsibility and not take ownership of the problem. We blame CEOs, users, vendors, recruiters, dogs, cats, “Russia“, “China” – anyone but ourselves. Could it be we failed as a sector to raise awareness, both internally and externally?

So What Are Common Understandings Of Security Roles?

After 25 years+ we still don’t have universally accepted role descriptions, but at least we can say that some patterns are emerging. Security roles involve looking at risk holistically, and sometimes advising on how to deal with risk:

  • Security Engineers assess risk and design and sometimes also implement controls. BTW some sectors, legal in particular, still struggle with this. Someone who installs security products is in an IT ops role. Someone who upgrades and maintains a firewall is an IT ops role. The fact that a firewall is a security control doesn’t make this a security engineering function.
  • Security Architects take risk and compliance goals into account when they formulate requirements for engineers.
  • Security Analysts are usually level 2 SOC analysts, who make risk assessments in response to an alert or vulnerability, and act accordingly.

This subject evokes as much emotion as CISSP. There are lots of opinions out there. We owe to ourselves to be objective. There are plenty of sources of information on these role definitions.

No Aspect Of Risk Assessment != Security. This is Devops.

If there is no aspect of risk involved with a role, you shouldn’t looking for a security professional. You are looking for DEVOPS peeps. Not security peeps.

If you want a resource to install and configure tools in cloud – that is DEVOPS. It is not Devsecops. It is not Security Engineering or Architecture. It is not Landscape Architecture or Accounting. It is not Professional Dog Walker. it is DEVOPS. And you should hire a DEVOPS person. If you want a resource to install and configure appsec tools for CI/CD – that is DEVOPS. If you want a resource to advise on or address findings from appsec tools, that is a Security Analyst in the first case, DEVSECOPS in the 2nd case. In the 2nd case you can hire a security bod with coding experience – they do exist.

Ok Then So What Does A DevSecOps Beast Look Like?

DevSecOps peeps have an attack mindset from their time served in appsec/pen testing, and are able to take on board the holistic view of risk across multiple technologies. They are also coders, and can easily adapt to and learn multiple different devops tools. This is not a role for newly graduated peeps.

Doing Security With Non-Security Professionals Is At Best Highly Expensive

Another important point: what usually happens because of the skills gap in infosec:

  • Cloud: devops fills the gap.
  • On-premise: Network Engineers fill the gap.

Why doesn’t this work? I’ve met lots of folk who wear the aforementioned badges. Lots of them understand what security controls are for. Lots of them understand what XSS is. But what none of them understand is risk. That only comes from having an attack mindset. The result will be overspend usually – every security control ever conceived by humans will be deployed, while also having an infrastructure that’s full of holes (e.g. default install IDS and WAF is generally fairly useless and comes with a high price tag).

Vulnerability assessment is heavily impacted by not engaging security peeps. Devops peeps can deploy code testing tools and interpret the output. But a lack of a holistic view or an attack mindset, will result in either no response to the vulnerability, or an excessive response. Basically, the Threat And Vulnerability Management capability is broken under these circumstances – a sadly very common scenario.

SIEM/Logging is heavily impacted – what will happen is either nothing (default logging – “we have Stackdriver, we’re ok”), or a SIEM tool will be provisioned which becomes a black hole for events and also budgets. All possible events are configured from every log source. Not so great. No custom use cases will be developed. The capability will cost zillions while also not alerting when something bad is going down.

Identity Management – is not deploying a ForgeRock (please know what you’re getting into with this – its a fork of Sun Microsystems/Oracle’s identity management show) or an Azure AD and that’s it, job done. If you just deploy this with no thought of the problem you’re trying to solve in identity management, you will be fired.

One of the classic risk problems that emerges when no security input is taken: “there is no personally identifiable information in development Virtual Private Clouds, so there is no need for security controls”. Well – intelligence vulnerability such as database schema – attackers love this. And don’t you want your code to be safe and available?

You see a pattern here. It’s all or nothing. Either of which ends up being very expensive or worse. But actually come to think of it, expensive is the goal in some cases. Hold that thought maybe.

A Final Word

So – if the word risk doesn’t appear anywhere in the job description, it is nothing to do with security. You are looking for devops peeps in this case. And – security is an important consideration for cloud migrations.

Exorcising Dark Reading’s Cloud Demons

Dark Reading recently covered what it says are Cloud “blind spots”. Really, there is some considerable FUD here. Is there really anything unique to cloud with the issues called out?

I’m not pro or anti-cloud. It is “just another organisation’s computer platform” as they say, and whereas this phrase was coined as a warning about cloud, it also serves as a reminder that the differences with on-premise aren’t as much as many would have you believe. In some areas cloud makes things easier and its a chance to go greed field and fix the architecture. Some fairly new concepts such as “immutable architecture” will change security architecture radically but there is nothing carved in stone which says this can only be implemented in cloud. In case it needed calling out – you can implement microservices in your data centre if microservices are what does it for you!

Migrating to cloud – there’s a lot to talk about regards infosec, and i can’t make an attempt at doing that comprehensively here. But some points to make based on what seems to be popular misconceptions:

  • Remember you will never get access to the Cloud Service Provider’s (CSP) hypervisors. Your hardware is in their hands. Check your SLA’s and contract terms.
  • SaaS – it many cases it can be bad to hand over your operating systems to CSPs, and not just from a security perspective. In the case of on-premise, it was deemed a good business choice to have skilled staff to administer complex resources that present many configuration options that can be used and abused for fun and profit. So why does moving to cloud change that?
  • Saas and SIEM/VA: Remember this now means you lose most of your Vulnerability Assessment coverage. And SaaS and SIEM is getting more popular. Due to the critical nature of a SIEM manager, with trust relationships across the board, personally i want access to the OS of such a critical device, but that’s just me.

So picking out the areas covered briefly….

  • Multi-cloud purchasing – “The problem for security professionals is that security models and controls vary widely across providers” – if it takes a few weeks to switch from on-premise to Cloud, then for example, from Azure to Google, or AWS, chances are they were struggling with on-premise. Sorry but there’s no “ignorance” here. A machine is now a VM but it still speaks TCP/IP, and a firewall is now not something in the basement…ok i’ll leave it there.
  • Hybrid Architecture – tracking assets is easier in cloud than it is on-premise. If they find it hard to track assets in cloud, they certainly find it hard on-premise. Same for “monitoring activity”.
  • Likewise with Cloud Misconfiguration – they are also finding it hard on-premise if they struggle with Cloud.
  • Business Managed IT – not a cloud specific problem.
  • Containers – “new platforms like Kubernetes are introducing new classes of misconfigurations and vulnerabilities to cloud environments faster than security teams can even wrap their arms around how container technology works. ” – WHAT!!? So does the world hold back on these devops initiatives while infosec plays catch up? There is “devsecops” which is supposed to be the area of security of security which is specialised in devops. If they also struggle, then what does it say about security? I have to say that on a recent banking project, most of the security team would certainly not know what Docker is. This is not a problem with Cloud, its a problem with security professionals coming into the field with the wrong background.
  • Dark Data – now you’re just taking the proverbial.
  • Forensics and Threat-Hunting Telemetry – show yourself Satan! Don’t lurk in the shadows! Ignore the fancy title – this is about logging. “Not only do organizations struggle to get the right information fed from all of their different cloud resources” – This is a logging problem, not a cloud problem. Even SaaS SIEM solutions do not present customers with issues getting hold of log data.

What happened Dark Reading? You used to be good. Why does’t thou vex us so? Cloud is just someone else’s computer, and just someone else’s network – there are a few unique challenges with cloud migrations, but none of those are mentioned here.