lxndryng - a blog about nothing

VS Code, Python and Fedora on WSL

Jan 27, 2022

When trying to set up my development environment for Python using WSL and VS Code, I found that I was unable to select an interpereter from the list in VS Code as the list was flickering at a rate that made such a selection impossible. I found that the following was being spammed into the VS Code terminal:

bash: which: line 1: syntax error: unexpected end of file
bash: error importing function definition for `which'
/usr/bin/sh: which: line 1: syntax error: unexpected end of file
/usr/bin/sh: error importing function definition for `which'
/usr/bin/sh: which: line 1: syntax error: unexpected end of file
/usr/bin/sh: error importing function definition for `which'
/usr/bin/sh: which: line 1: syntax error: unexpected end of file
/usr/bin/sh: error importing function definition for `which'
/usr/bin/sh: which: line 1: syntax error: unexpected end of file
/usr/bin/sh: error importing function definition for `which'
/usr/bin/sh: which: line 1: syntax error: unexpected end of file
/usr/bin/sh: error importing function definition for `which'
/usr/bin/sh: which: line 1: syntax error: unexpected end of file
/usr/bin/sh: error importing function definition for `which'

On checking the Fedora rootfs for anything that redefines the which command, I found /etc/profile.d/which2.sh, which aliases which to include aliases and functions defined in the shell. The VS Code Python plugin cannot parse the output of this aliased which, causing the error. Deleting /etc/profile.d/which2.sh solves the problem - until the next which package update, at least.

Boss Waza Air case - unofficial

Jan 26, 2022

I recently purchased the Boss Waza Air headphone/amp modeller and wanted to not pay the full £24.99 for the official case from Boss, while accepting the need I had for a case given the careless abandon with which I treat anything that looks like a pair of headphones. Compounding my woes, the official case wasn't likely to be in stock anywhere in the UK for several months.

I chanced upon the iGadgitz U3804 EVA Hard Case Cover on Amazon, and from the reviews it appeared that someone had already taken the leap of using it with the Waza Air. £11.99 later and I have what seems to be a very sturdy case to protect the headphones.

Linux on a Gigabyte Aero 14

Dec 16, 2018

Another 'new' laptop, another set of Nvidia Optimus graphics issues to mess around with, all for the sake of a GTX1060 in a laptop that's probably a little too heavy to be an every day carry. This laptop has the additional element of fun of the touchpad not working by default without a kernel option being passed at boot.

So, to reolve the keypad issue i8042.kbdreset=1 needs to be passed to the kernel at boot, and to resolve the issue of the Intel graphics card causing a hard lock on boot, the options acpi_osi=! acpi_osi="Windows 2009" need to be passed.

This assumes that you're using Nouveau and not the proprietary Nvidia drivers. I have no will to do this, so any advice I'd give would be worthless.

The Need for Banking APIs

May 01, 2017

I've misspent a bank holiday weekend trying to make it a little easier for myself to manage my money without having to turn to a plethora of different devices for different pieces of information to do so. The workflow that I have at present involves a mobile phone, a keyfob and between three and five passwords stored in a variety of password managers: clearly, this setup is not something that I particularly ever want to deal with when I just want to quickly check up on my investments or handle the "oh, I've just been paid, I should do my monthly financial tasks" inevitability at the end of each month. I've developed automation for Hargreaves Lansdown, but the security policies of the other organisations I perform financial transactions with don't permit me to take control of my financial affairs in an automated way.

The UK context for opening up banking data

People have been making noise about the lack of APIs in banking, with Payments UK establishing the Open Banking Implementation Entity to develop a set of standards that would agreeable to banks operating in the UK.

This entity hasn't published meeting notes since October 2016, so who knows what's happening in that space now - given that is was an initiative involving the only people whose IT moves slower than that of government, probably nowhere.

This does seem to be a little bit of a deaf, dumb and blind approach though: projecting massively onto the rest of the population, I don't necessarily need something fancy in this space. The majority of banks provide exports to comma-separated values, Quicken and Microsoft Money formats, which I can then readily interrogate for any information. The issue is that I usually would have to navigate an online bank account interface that hasn't been updated since HTML tables were considered gauche, and I'd have to handle the authentication step of using a multi-factor authentication token, something that can't readily be abstracted away from the concrete implementation of each bank's token generator.

In terms of what is 'real' in this space at present, the API offerings are generally limited to a branch locator, an ATM locator and a product search API (as implemented by RBS and HSBC's banking brands). I appreciate the opening up of this data, but I can already obtain this location data from the Google Maps API and I don't really want to (as an end-user) automate my product selection, given how creative with the truth banks can be about what is truly offered. There seems to be such a gulf between what customers really need, as opposed to what the minimum points of contention between the banks could be. Of course, this is just the cost of trying to get the elephants of the financial sector to move away from the oases they've always known.

So why not do it manually?

I don't want to.

I guess that is the crux of it: I could manually go into the portals of each of my financial service providers and fetch a CSV file, put it somewhere and process it in any way I choose. But I don't want to. I don't want to be beholden to what financial service providers feel I should be able to do with my financial data. Of course, that's always the cost of doing business with anyone, but that would never stop me from being sore about it.

The spectre of multi-factor authentication and corporate inertia

Large corporates aren't the smartest when it comes to security in their customer-facing applications, and I think it would be naive to assert that large financial institutions would be immune to either 1) outright stupidity, as in the linked examples, or 2) groupthink that serves to permeate the entirety of a profession within an organisation.

In the context of multi-factor authentication used by banks, (2) is far more likely to be an issue in providing a good, automatable and secure API service to customers. The typical enterprise "these are the processes we have, they are immutable" inertia and subsequent ennui would be likely to set in: our current service is 'secure', so why would we do anything else? I've seen this time and again throughout my career and it seems to be something that no large corporate is immune to.

The hope that we have to have here is that someone explains how the likes of Amazon's IAM, OAuth or any number of other token-based authentication methods work. I've never had any more faith in an mobile app-based multi-factor authentication token generator than even the most simple of JSON Web Token generators, so hopefully others could come around to a similar realisation.

Is there hope for the future?

As far as I can see, my hopes are all pretty much in one basket, and it's not one I'm comfortable with: I'm not one to pin my hopes for change on a so-called 'disruptive' startup; and I'm certainly not one to hope for 'market forces' to pressure the larger players to compete with relatively niche service offerings. That said, Monzo recently being given a banking license, combined with the commitment to their APIs and integration platform that they've demonstrated throughout Beta, does give me some hope. If nothing else, it is a differentiator which may shape the choices I make over who I bank with.

On the investments front, not even Nutmeg appear to want to do anything in terms of exposing APIs to customers, so I may just have to make do with my own wranglings in that space.

Using physical devices in VirtualBox

Apr 12, 2017

Sometimes it may be useful to, for example, access a physical Linux installation on a device from within another physically-installed operating system on the same device. Fortuntately, this is possible with the VBoxManage command for VirtualBox. An example of such a command is given below:

VBoxManage internalcommands createrawvmdk -filename /path/to/file.vmdk -rawdisk /dev/sda

On Windows, the argument for the -rawdisk switch should take the form of \\.\PhysicalDrive0, where the disk identifier can be found using diskpart's LIST DISK command. The VBoxManage command needs to be run as Administrator, with any virtual machines to be launched using the VMDK produced also requiring VirtualBox to be run as an Admionistrator in order to be able to access the disk.

Building a Docker Container for Taiga

Mar 11, 2017

It's no secret to people who know me that I am not the most organised person in the world when it comes to my personal life: far too often, things can wait until... well, until I forget about them. As part of a general bit to be more proactive about the things I want to get done in my free time, I had a look at the open-source market for open-source project management software (the people who use Jira at work extensively always seem to be the most organised, but I'm not paying for this experiment to look into my own sloth) and came out wanting to give Taiga a try, with it being a Python application that I'd be able to extend with a minimum of effort if there was some piece of obscura I'd wish to contribute to it. Of course, my compunction towards self-hosting all of the web-based tools I use meant that the second half of the question would be to find a means by which I could easily deploy, upgrade and manage it.

Enter Docker. I'd initially found some Docker images on Docker Hub that worked and in a jovial fit of inattentive, proceeded to use them without quite realising how old they were. Eventually, I noticed that they were last built nineteen months ago, for a project that has a fairly rapid release cadence. Fortunately, the creator of those images had published their Dockerfiles and configuration on GitHub; unfortunately, however, that configuration was itself out of date given recent changes in the supporting libraries for Taiga. The option of looking for other people's Docker containers, of course, did not occur to me, so I endeavoured to update and expand upon the work that had been done previously.

Taiga's architecture

Taiga consists of a frontend application written in Angular.js (I'm not a frontend person - I couldn't tell you if it was Angular 1 or Angular 2) and a backend application based on the Django framework. The database is a PostgreSQL database, nothing really fancy about it.

A half-done transformation

Looking at the code used to generate the Docker images, I noticed that there was a discrepancy between several of the paths used in building the interface between the frontend and backend applications: in the backend application, everything seemed to point towards /usr/src/app/taiga-back/, whereas in the frontend application, references were made to /taiga. This dated from the backend application being built around the python base image, before being changed to python-onbuild. The -onbuild variety of the image gives some convenience methods around running pip install -r requirements.txt without manual intervention, which I can see as a worthwhile bit of effort in terms of making future updates to the image easier. Unfortunately, it does change the path of your application: something that hadn't been fixed up to now. Fortunately, a trival change of the frontend paths to /usr/src/app/taiga-back solved the issue,

Le temps detruit tout

Some time between the last time the previous author pushed his git repository to GitHub and now, the version of Django used by Taiga changed, introducing some breaking module name changes. The Postgres database backend module changed from transaction_hooks.backends.postgresql to django.db.backends.postgresql, with the new value having to be declared in the settings file that was to be injected into the backend container.

Doing something sensible about data

Taiga allows users to upload files to support the user stories and features catalogued within the tool, putting these files in a subdirectory of the backend application's working directory. Now, if we're to take our containers to be immutable and replacable, this just won't do: the deletion of the container would result in the deletion of all data therein. Given that the Postgres container was set up to store its data on the filesystem of the host, outside of the container, it's a little odd that the backend application didn't have the same consideration taken into account. By declaring the media and static directories within the application to be VOLUMEs in the Dockerfile resolved this issue.

Don't make assumptions about how this will be deployed

In the original repository, the ports and where HTTPS was being used for communication between the front and backend had been hard-coded into the JSON configuration for the frontend application: it was HTTP (rather than HTTPS) on port 8000. Now, if one was to deploy this onto a device running SELinux was the default policy, setting up a reverse proxy to terminate SSL would have been impossible because of the expectation that port 8000 would only be used by soundd - with anything else trying to bind to that port being told that it can't. To remedy this, I made the port aprotocol being used configurable from environment variables at the time of container instantiation.


The repository put together previously contained, as well as the Dockerfiles for generation of the images, scripts to deploy the images together and have the application work. It did not, however, have any cconsideration how an upgrade could work. With that in mind, I put together a script that would pull the latest versions of the images I'd put together, tear down the existing containers, stand up new ones and run any necessary database migrations. Nothing more complex than the below:


if [[ -z "$API_NAME" ]]; then

if [[ -z "$API_PORT" ]]; then

if [[ -z "$API_PROTOCOL" ]]; then

docker pull lxndryng/taiga-back
docker pull lxndryng/taiga-front
docker stop taiga-back taiga-front
docker rm taiga-back taiga-front
docker run -d --name taiga-back  -p -e API_NAME=$API_NAME  -v /data/taiga-media:/usr/src/app/taiga-back/media --link postgres:postgres lxndryng/taiga-back
docker run -d --name taiga-front -p -e API_NAME=$API_NAME -e API_PORT=$API_PORT -e API_PROTOCOL=$API_PROTOCOL --link taiga-back:taiga-back --volumes-from taiga-back lxndryng/taiga-front
docker run -it --rm -e API_NAME=$API_NAME --link postgres:postgres lxndryng/taiga-back /bin/bash -c "cd /usr/src/app/taiga-back; python manage.py migrate --noinput; python manage.py compilemessages; python manage.py collectstatic --noinput"

GitHub repository

The Docker configuration for my spin on the Taiga Docker images can be found here.

Building a Naive Bayesian Programming Language Classifier

Mar 03, 2017

GitHub's Linguist is a very capable Ruby project for classifying the programming language(s) of a given file or repository, but struggles a little when there isn't a file extension present to give a first initial hint as to what programming language may be being used: given this lack of an initial hint, none of the clever heuristics that are present within Linguist can be applied as part of analysis of the source code. As part of a project I'm working on at the moment, I have around 32,000 code snippets with no file extension information that I'd like to classify, with the further knowledge that some of these snippets may not be in a programming language at all, but rather a natural language or maybe just be encrypted or encoded pieces of text. Applying the Pythonic if it quacks like a duck, it's a duck approach, a naive Bayesian approach whereby we just see if a snippet looks like something we've seen in another language seems like it might work well enough.

So why a Bayesian method?

In the main: I'm lazy and not a particularly mathematically inclined person. I also wrote half of a dissertation on Bayesian methods as applied to scientific method, so I've got enough previous in this space to at least pretend I've got some background in the field. On top of that, Bayesian classifiers give us an easy way to assume that the incidence of any evidence is independent of the incidence of any other. We end up with a fairly simple equation for finding the probability of a given programming language given the elements of language we have in a code snippet.

P(Language|Snippet n-grams) = P(Snippet n-gram(1)|Language) * P(Snippet n-gram(2)|Language) ... P(Snippet n-gram(n)|Language)
                                          P(Snippet n-gram(1)) * P(Snippet n-gram(2)) ... P(Snippet n-gram(n))

We end up with very small numbers here, so much that we get floating point underflow. To avoid this, we can use the natural logarithms of the probabilities on the right hand-side, and add rather than multiply them.

How do we identify languages?

Linguist has six so-called "strategies" for identifying programming languages: the presence of emacs/vim modelines in a file, the filename, the file extension, the presence of a shebang (eg, !#/bin/bash) in a given file, some predefined heuristics and a Bayesian classifier of its own, though with no persistence of the training data across runs of the tool. In this approach, we'll only be implementing the classifier, but using heuristic-like methods to supplement the ability of the model to accurately identify certain languages.

The first element of the classification model will be based upon n-grams, where n will be between 1 and 4. I want to be able to classify on the basis of single keywords (eg, puts in Ruby), as well as strings of words (eg, the public static void main method signature in Java).

At the core of this, we have a very basic tokeniser that should give us enough information to put together some tokens that will give us enough to go on and create the n-grams that will give us the ability to infer the language code snippets are written in.

A simple improvement on this would be to remove anything that would add plaintext to the mix: comments, docstrings and the like. As I said above, I'm not really concerned with 100% accuracy: something that quacks like a duck might be enough for us to say that it's a duck here.

Languages I have to deal with

From a cursory look at the 32,000 snippets, I know that I definitely have to be able to identify and distinguish between Python, Ruby, C#, C, C++, x86 (I think, this could be a rabbit hole and a half to go down) assembly and Java. We can reasonably expect that differentiation between Java and C# and C and C++ will be painful and prone to error until we refine the model given the similarities that these languages have to one another.

To start, I will just be attempting to demonstrate that my broad approach works with at least Python, Ruby, Assembly, C# and Java before looking to incorporate more languages.

Persistence of the probability data

With a Bayesian approach, we need to be able to refer to a trained model of what the probabilities of given features are for a given programming language in order to make a prediction of which programming language a given snippet will be. In order to do this, we need to store this probability information somewhere. For the sake of simplicity, I'll be doing this in MariaDB, with the basic schema below:

DROP DATABASE IF EXISTS bayesian_training;
CREATE DATABASE bayesian_training;
USE bayesian_training;
CREATE TABLE languages(
    language VARCHAR(20) UNIQUE NOT NULL
CREATE TABLE occurences(
    gram_id INT NOT NULL,
    language_id INT NOT NULL,
    number INT NOT NULL,
    PRIMARY KEY(gram_id, language_id),
    FOREIGN KEY(gram_id) REFERENCES grams(id),
    FOREIGN KEY(language_id) REFERENCES languages(id)

Training of the model

To train the model, I used the following codebases:

  • Python
    • Django
    • Twisted
  • Ruby
    • Sinatra
    • Discourse
  • Java
    • Jenkins
    • Lucene and Solr
  • Assembly
    • BareMetalOS
    • Floppy Bird
  • C#
    • GraphEngine
    • Json.Net
    • wtrace

These are, in the realms of real-world code usage, pretty small samples to be going on, but should hopefully give us enough to get a system together that works.

How effective was our initial model?

In order to test how well we did, I tested the following files against the model:

The results:

linguist.rb: [(7953, 'asm', -6416.136889371387), (8002, 'c#', -6630.869975742312), (3931, 'java', -6849.643512121844), (1, 'python', -6302.470348917564), (1763, 'ruby', -5991.090879392727)]
ZipFile.cs: [(7953, 'asm', -164549.47730156567), (8002, 'c#', -144878.96700648475), (3931, 'java', -152243.66607448383), (1, 'python', -158673.75993403912), (1763, 'ruby', -159188.1657594956)]
flask/app.py: [(7953, 'asm', -189603.1365282128), (8002, 'c#', -195084.66248479518), (3931, 'java', -196401.08214636377), (1, 'python', -171435.95779745755), (1763, 'ruby', -183980.2802635695)]
tetros.asm: [(7953, 'asm', -17961.240272497354), (8002, 'c#', -28535.4183269716), (3931, 'java', -28894.289472605506), (1, 'python', -28315.088969821816), (1763, 'ruby', -27569.732161692715)]

For all of the tested files, the maximum value of the logarithms is the language that we knew it was: at least we're getting the right answers, for the most part.

Technical niggles

The way that they're constructed, the databases queries used in the training stage can become incredibly large. These queries can be too large for the default value of max_allowed_packet of 1MB in my.cnf. Setting this to 64MB was sufficient to have all of my queries resolved.


The code used for this classifier can be found at GitLab. This may also be released to PyPI at some point.

ASUS Zenbook Pro UX501VW configuration for Linux

Feb 13, 2017

Never trust laptop OEMs if you want to run Linux on a laptop. Well, maybe the more sensible option is to buy laptops from vendors who explicitly support Linux on their hardware (the Dell XPS and Precision lines are supposed to be good for this, as well as the incomparable System76. All of this said, I own an ASUS Zenbook UX501VW and it is a good machine, just a little tempremental when it comes to running Linux, expecially compared to my Lenovo Thinkpad X1 Carbon. Hopefully the following misery I went through will be of use to someone else with this laptop.

Graphics issues

Most people, upon booting any graphical live CD/USB will be greeted with the spinning up of their laptop's fans followed by a hard lock up. Probably surprising to no one, this is an issue with the Nvidia switchable graphics: some ACPI nonsense occurs if the laptop is started with the Nvidia card powered down. There are two options for getting around this:

1. Disabling the Nvidia card's modesetting altogether

To do this, you need to set the kernel option of nouveau.modeset=0. The card will then not have modesetting enabled and therefore will not cause an issue once X loads.

2. Making it seem like you're running an unsupported version of Windows

This is witchcraft and I make no claims to understand how it works, but setting the kernel option acpi_osi=! acpi_osi="Windows 2009" stops the issue that causes the X lockups that occur usually.

Backlight keyboard keys

To enable the keyboard buttons for brightness adjustment to work (and brightness adjustment at all in some cases), the following kernel options need to be specified.

acpi_osi= acpi_backlight=native

These options aren't compatible with the second option above, so pick being able to do CUDA development on a laptop (come on, now) or being able to change the brightness. It was an easy enough choice for me.

Touchpad issues

This is a matter of luck: some of the models designated UX501VW has a Synaptics touchpad and they will work brilliantly out of the box. If you're a little more unfortunate, you have a FocalTech touchpad - a touchpad that only this and a couple of other ASUS devices have. A quick way to tell is to test two-finger scrolling: if it works, you have a Synaptics touchpad - enjoy your scrolling. If it doesn't, you probably have the FocalTech.

There is, however, a DKMS driver available for this touchpad which is targeting inclusion in the mainline kernel. It might take a while to get there, but it will be support by default soon enough. In the interim, cloning the git repository linked above and making sure you have the prerequisites installed (apt-get install build-essential dkms for Debian/Ubuntu-based systems) and running ./dkms-add.sh from within the directory should be enough to get you going.

Every time your kernel updates, you'll need to re-run the ./dkms-add.sh.

Setting up open-source multi-factor authentication for Amazon WorkSpaces

Feb 09, 2017

AWS's Identity and Access Management (IAM) is a wonderful service that allows its users to leverage an incredibly powerful suite of fine-grained access controls to really implement the principle of least privilege to secure services hosted with AWS. It also happens to have a fairly simple multi-factor authentication (MFA) approach that uses the popular OATH TOTP (Time-based One Time Password) standard implemented across a number of virtual and hardware token generators.

Amazon's Desktop-as-a-Service WorkSpaces product, however, departs from this inherently de jure and de facto standards approach through TOTP and IAM for multi-factor authentication, rather leaving the users of the service to use a second factor of their choosing, as long as it can be handled through RADIUS: a problem which has recently caused me some issue at work. In the hope that no one should have to deal with this in the same way I did, it's probably worth going over where the issues start and how they can be addressed. It's probably just best to ignore the fact that RADIUS is a technology best left in the nineties.

WorkSpaces authentication architecture, in broad strokes

To use multi-factor authentication at all, it is necessary that the directory service that supports the WorkSpaces that one wishes to enable multi-factor authentication be a Microsoft Active Directory (AD) instance - if you were using Amazon's Simple Directory services, you're going to have to start again. In order to leverage an external AD instance, it is necessary to deploy an Amazon service called AD Connector into two subnets (for High Availability purposes, let's ignore the fact that we're only running a single domain controller - don't worry about it) within the VPC that is being used to host the WorkSpaces instances. It is the AD Connector service that provides the magic that will allow us to put an MFA solution into practice: configuration of a RADIUS server to be used as the second factor for authentication can be undertaken here.

In the context of the MFA flow for WorkSpaces, the public-facing brokering service that Amazon provides to enable connectivity over PCoIP to the Workspaces instance orchestrates the authentication flow such that initial authentication occurs with username and password against AD, with the username and provided MFA token being submitted to the RADIUS server in the event that the inital authentication is successful.

So what do we need?

If we assume that we don't have any authentication infrastructure outside of what we're putting in place for WorkSpaces, aside from the AD domain to be used as the first factor, we need:

  • A RADIUS server
  • A means to generate one-time passwords
  • Management infrastructure for the one-time passwords

Things of note

If your organisation's MFA approach is premised upon TOTP, you're going to be doing something non-compliant with that approach for WorkSpaces: Amazon (at time of writing) explicity - in a footnote in an FAQ - warns its users that TOTP is not supported. The only standardised OTP process that remains, then, is HOTP - an algorithm that relies upon a counter, so will probably cause you operational issues with users who may generate tokens without using them.

It is also necessary (for fairly common-sense reasons) that your user store called out to from RADIUS has the exact same usernames as those present in your Active Directory, and case sensitivity can be an issue here.

An open-source product selection for RADIUS and OTP generation

Much as my first inclination will be to prefer open source solutions, the first port of call for the RADIUS service to support WorkSpaces was Microsoft's Network Policy and Access Services: a quick answer was preferable to a philosophically satifying one. It turns out, however, that there is no OTP verification service available for integration with NPS that didn't cost money. Given the speed needed to get an MFA solution bottomed out, working through a corporate commercial process didn't seem that appealing.

The aptly named FreeRADIUS seemed a sensible choice for the RADIUS server, given its flexibility and support for a number of means of authorisation - in case no OTP mechanism would readily become available. After a little bit of digging, I found a PHP script/class called multiotp that seemed to offer the functionality that I required (HOTP), as well as being able to integrate with user lists from an AD server.

A VPC design for MFA-supporting WorkSpaces

Amazon VPC for MFA

MultiOTP configuration

While multiotp does offer a MySQL backend (which could be made readily resilient using AWS's Relational Database Service, this has been omitted here for the sake of simplicity: the flat file backend should be sufficient as long as we are sensible about syncing it between the two RADIUS servers. In order to set up a user, we can issue the command:

multiotp -create -no-prefix-pin [username] hotp [hexademical-encoded-seed] 1111

OpenSSL's rand command can be used with its -hex switch to generate a random hex string encoded to provide a suitable seed.

Following this, we can generate a QR code for the user to use to register WorkSpaces as an application in their software token generator:

multitop -qrcode [username] [file_path_for_png.png]

multiotp caveats

multiotp maintains its own user database incorporating some OTP-specific AD user synchronisation makes some unfortunate assumptions about the types of OTP you'll want to use: it will import all users with the assumption that they will be using TOTP, rather than the required HOTP here, with the CLI not currently presenting configuration options to change the algorithm associated with a user once set or to change the default algorithm used for AD-synced users.

I expect I'll submit some pull requests against their GitHub project in the fullness of time, but even with the caveats made explicit above, it does seem to be the most competent means of returning a RADIUS-compatible response from an OTP generation algorithm (a number of, in fact). Until Amazon does something more sensible around MFA for WorkSpaces, something that works will suffice when the alternative is nothing.

The seed provided to multiotp needs to be a hexademical string, while software tokens (such as Google Authenticator) will often require a base32-encoded version of the string used to generate the first hex string. The easier way around this is to use the token generation algorithm in the multiotp class and just generate a QR code using the CLI to be distributed to the user.

FreeRADIUS configuration

With a user account to actually test against, we need to configure FreeRADIUS to hand-off authentication requests to multiotp. In order to do this, we create a new module for authentication in /etc/raddb/modules/ called multiotp with the content below:

exec multiotp {  
    wait = yes  
    input_pairs = request  
    output_pairs = reply  
    program = "/path/to/multiotp '%{User-Name}' '%{User-Password}' -request-nt-key -src=%{Packet-Src-IP-Address} -chap-challenge=%{CHAP-Challenge} -chap-password=%{CHAP-Password} -ms-chap-challenge=%{MS-CHAP-Challenge} -ms-chap-response=%{MS-CHAP-Response} -ms-chap2-response=%{MS-CHAP2-Response}"  
    shell_escape = yes  

In the file /etc/raddb/sites-enabled/default, the directive multiotp should be added before the first instance of pap in the file, and the first instances of chap and mschap commented out with a hash (#). Additionally, the following should be added prior to the first Auth-Type PAP in the file:

Auth-Type multiotp {  

In /etc/raddb/sites-enabled/inner-tunnel, the first instances of chap and mschap should against be commented out, and the following added prior to the first Auth-Type PAP directive:

Auth-Type multiotp {  

The authorisation policy established above then needs to be enabled in /etc/raddb/policy.conf by adding the following just before the last } in the file:

multiotp.authorize {  
    if (!control:Auth-Type) {  
        update control {  
            Auth-Type := multiotp  

In /etc/raddb/clients.conf, appropriate client information should be populated, with [RADIUS shared secret] being replaced with a secure shared secret to be used to establish the RADIUS connection between AD Connector and FreeRADIUS:

client {  
    netmask = 0  
    secret = [RADIUS shared secret]  

Start the RADIUS server in debug mode with radiusd -X and leave it running: upon setting us the RADIUS server in AD Connector, we'll be able to make sure that things are working as they should here.

AD Connector configuration

In the AWS Console, MFA can be activated through the Update Details menu for directories defined within the WorkSpaces service. Enter the IP address of your RADIUS server and the shared secret defined earlier within the Multi-factor Authentication.

WorkSpaces MFA screen

Upon clicking update, you should see an authentication request from a user of awsfakeuser with the password badpassword. After a few minutes, the RADIUS service will be registered for the WorkSpaces directory. From here, try generating an MFA code for real and signing into a WorkSpace using the WorkSpaces client.

Flask, Safari and HTTP 206 Partial Media Requests

Jun 08, 2014

While working on my Python 3, Flask-based application Binbogami in order that a friend would be able to put their rebirthed podcast online, a test scenario that I hadn't thought to check upon came to light: streaming MP3 in Safari on an iOS device. It turns out that attempting to do this resulted in an error in Safari along the lines of the below:

Safari iOS error

A little more investigation showed that this error was repeated on Safari on OSX. Given the unfortunate trinity of erroneous situations that Binbogami seemed to fall foul of, it seemd that the problem lay with how Safari, or QuickTime as the interface for media streaming under Safari on these platforms, was attempting to fetch the file.

The problem

A cursory DuckDuckGo search led me to find that where Firefox, Chrome, Internet Explorer and Opera all use a standard HTTP GET request for fetching media, even where this media could be considered to be being streamed, Safari's dependency on QuickTime for media playback meant that upon attempting to fetch the file, an initial request for the first two bytes of the file is made to determine its length and other header-type information, using the Range request header, with Range requests consequent to these two bytes being made subsequently.

By default, the method that I was making use of in Flask to serve static files does not issue the HTTP 206 response headers necessary to make this work, as well as not paying any heed to the Range of bytes that are requested in the request headers.


While it seemed apparent that implementation of the correct headers in the HTTP response and implementing some sort of custom method to send only the requested bytes within a file would be the way around this, my head was not particularly in the space of implementation. Again, with some internet searching I came across an instructive blog post, that appeared to have a sensible answer. With a little bit of customisation to suit my own particularities:

def send_file_206(path, safe_name):
    range_header = request.headers.get('Range', None)
    if not range_header:
        return send_from_directory(current_app.config["UPLOAD_FOLDER"], safe_name)

    size = os.path.getsize(path)
    byte1, byte2 = 0, None

    m = re.search('(\d+)-(\d*)', range_header)
    g = m.groups()

    if g[0]: byte1 = int(g[0])
    if g[1]: byte2 = int(g[1])

    length = size - byte1
    if byte2 is not None:
        length = byte2 - byte1

    data = None
    with open(path, 'rb') as f:
        data = f.read(length)

    rv = Response(data,
    rv.headers.add('Content-Range', 'bytes {0}-{1}/{2}'.format(byte1, byte1 + length - 1, size))
    return rv

A secondary issue

While the above did lead Safari to believe that it could indeed play the files, it would always treat them as "live broadcasts", rather than MP3 files of a finite length. This is due to the way in which QuickTime establishes the length of a file through it's initial requests for a few bytes at the head of a file: if it cannot get the number of bytes that it expects, it ceases trying to issue Range requests and instead issues a request with an Icy-Metadata header, implying that it believes the file to be an IceCast stream (WireShark is a wonderful tool).

The issue in the above code is found in the byte1 + length - 1 statement in the issued Content-Range header: where Safari is requesting two bytes in its first request (so the Range header will look like Range: 0-1) this will evaluate to only sending the 0 + (1 - 0) - 1 = 0th byte - not the 0th and 1st byte as requested. The file still looks like a valid MP3 file, however, so Safari requests the whole file as a stream - therefore leading to the "Live Broadcast" designation.

A simple fix was to add +1 to the length declaration, to make it length = byte2 - byte1 + 1.


It's interesting to see how differently major implemenations of media downloading functionality in mainstream browsers can differ based upon the technology underlying it. In the case of Safari's approach however, it seems somewhat contrary to the major use case of this: most people using the browser to access a media file will be seeking to download, rather than "stream" (in a traditional sense) it.

Safari's approach also has the downside of generating a lot of HTTP requests, which as a systems administrator can cause havok if you're yet to set up your log rotations for your webserver and application server container (Nginx and uWSGI in this case). It hadn't been long enough since I'd seen a high wa% in top.