Category Archives: work

Private key errors from Logstash when shipping logs over SSL using filebeat

We use the filebeat shipper to ship logs from our various servers, over to a centralised ELK server to allow people access to production logs.

Because the logs are shipped from various data centres, the filebeat shippers are configured to send logs using SSL. To achieve that, we followed the how to guide. For the curious, that means that each of our servers has a filebeat.yml configuration file containing the following excerpt,

output.logstash:
    hosts: ["hostname:5044"]
    ssl.certificate: "/etc/ssl/certs/our.crt"
    ssl.key: "/etc/ssl/private/our.key"

and the inputs config file on our ELK server contains the following,

input {
  beats {
    port => 5044
    ssl => true
    ssl_certificate => "/etc/ssl/certs/our.crt"
    ssl_key => "/etc/ssl/private/our.key"
  }
}

I upgraded Logstash this morning, from 6.2 to 6.4 and found that Logstash wouldn’t start. When I inspected the Logstash logs I found errors as follows:

[2018-08-30T10:58:50,842][ERROR][logstash.inputs.beats ] Looks like you either have a bad certificate, an invalid key or your private key was not in PKCS8 format.

I checked the filebeat shipper logs too, and found errors similar to the following:

2018-08-30T11:26:02Z ERR Connecting error publishing events (retrying): read tcp ip.address.here:45896->ip.address.here:5044: read: connection reset by peer

In actual fact, I needn’t have gone further than the error that I found in the Logstash error log which said, “your private key was not in PKCS8 format”.

The SSL key that we had shipped to the box was not in PKCS8 format, which according to the documentation is a requirement. The fix in the end was simple, and was provided in this comment against a github issue. I needed to create a PKCS8 version of our existing key. To do that, I ran the following command,

openssl pkcs8 -in /etc/ssl/private/our.key -topk8 -nocrypt -out /etc/ssl/private/our.p8

To use the new PKCS8 version of the key, I also needed to update the beats input file to use the new our.p8 key file and then restart Logstash. The revised config is as follows:

input {
  beats {
    port => 5044
    ssl => true
    ssl_certificate => "/etc/ssl/certs/our.crt"
    ssl_key => "/etc/ssl/private/our.p8"
  }
}

Hope this helps someone else!

Advertisements

Getting Image and Flavor information from the Rackspace API

At Gamevy we’re lucky enough to have been accepted onto the Rackspace UK Startup Programme. In combination with that, we’ve been using Ansible to deploy new servers, and perform application deployments.

When you create a new server at Rackspace using Ansible rax module there are two pieces of information you’ll need from Rackspace. One of those is the flavor, this defines the instance type (the configuration of the instance – e.g. how much RAM it is given). The second is the id of the image (operating system) you wish to use to create the server.

To date, we’ve not found an easy way to get this information. Our current method is to use cUrl, get an Access Token, and to then make further requests to the API to get the required information.

Given this is an infrequent task, I normally forget the necessary steps. I thought that by noting them down I might save myself, and others, some time in the future. This post assumes that you already have a Rackspace account, and that you have API access via a key which is associated with your account. If that’s not the case, Sign Up here, and then follow these steps to get your API key.

It’s worth noting that this post focuses on accessing functionality provided by Rackspace UK. If you’re accessing from the US, the steps are the same, but  you’ll need to change the URLs that are accessed. Full information about Rackspace’s API can be found at their documentation site.

Getting An Auth Token

First things first, let’s get an Auth Token. To do this, you’ll need a few pieces of information from your Rackspace account: your username, your API Key and your account number.

Login to your Cloud Control Panel, and in the top right hand corner of the screen you’ll see your username with an arrow pointing down next to it. Clicking on that will reveal a menu at the top of which will be your account number, note that down, you’ll need when you make calls to the API, later.

From the menu, select Account Settings. On the resulting screen, access your API secret and note that too.

Create a file, call it whatever you want, I called mine rackspace-creds.json on your machine (I put mine in my home directory, for ease of access). The file should contain the following:

{
    "auth":
    {
        "RAX-KSKEY:apiKeyCredentials":
        {
            "username": "YOUR_RACKSPACE_USERNAME",
            "apiKey": "YOUR_RACKSPACE_API_KEY"
        }
    }
}

To get your Auth Token, open up a terminal and issue the following command

curl https://identity.api.rackspacecloud.com/v2.0/tokens -X 'POST' \
-d @rackspace-creds.json -H "Content-Type: application/json" | python -m json.tool

I’ve piped the output of the cUrl command through python’s json tool to prettify it. In the response you receive, you’ll need to grab your :

"token": {
            "RAX-AUTH:authenticatedBy": [
                "APIKEY"
            ],
            "expires": "",
            "id": "YOUR_AUTH_TOKEN",
            "tenant": {
                "id": "",
                "name": ""
            }
}

List The Flavors And Images Available To You

Now that you’ve got an Auth Token, you can go ahead and make calls to the API endpoints for Flavors and Images. I’ve pasted the corresponding examples below:

Flavors

curl https://lon.servers.api.rackspacecloud.com/v2/YOUR_ACCOUNT_ID/flavors \
> -H "X-Auth-Token: YOUR_AUTH_TOKEN" | python -m json.tool

Images

curl https://lon.servers.api.rackspacecloud.com/v2/YOUR_ACCOUNT_ID/flavors \
> -H "X-Auth-Token: YOUR_AUTH_TOKEN" | python -m json.tool

 

Iterating an ansible task using a dictionary object of variables

Recently we have been writing ansible scripts to deploy one of our applications. In doing so, we came across the need to configure 3 node processes as services. All of these node processes used a similar init script but required different values writing into the script.

We chose to use the template task thinking that with the 4 variables we needed to populate on each occasion we should be able to use the with_items method on the task to do what we wanted. We did plenty of Googling and couldn’t find a way forward and so were all set to revert to writing three separate implementations of the template task, and three associated templates.

Asking a question on the ansible IRC channel eventually yielded an answer and I thought I would share our implementation.

We start with a generic template, service_init_script.j2. The script itself is not important to this blog post, you should note that way in which the ansible ( / jinja2) variables refer to item.foo – this is important, and was our initial stumbling block. Each time the task is iterated upon the items within the with_items block are referred to as an item:

#!/bin/bash
### BEGIN INIT INFO
 # Provides: {{ item.service_name }}
 # Required-Start: $all
 # Required-Stop: $all
 # Default-Start: 2 3 4 5
 # Default-Stop: 0 1 6
 # Short-Description: {{ item.service_description }}
 ### END INIT INFO
# Taken loosely from https://gist.github.com/tilfin/5004848
 prgcmd={{ item.service_exec }} # What gets executed?
 prgname={{ item.service_name }} # What's the name (used to ensure only one instance is running)
 prguser={{ item.service_user }} # Which user should be used
 pidfile=/var/run/{{ item.service_name }}.pid # Where should the pid file be stored?
 logfile=/var/log/{{ item.service_name }}.log
export BJA_CONFIG={{ bja_config }}
start() {
 if [ -f $pidfile ]; then
 pid=`cat $pidfile`
 kill -0 $pid >& /dev/null
 if [ $? -eq 0 ]; then
 echo "{{ item.service_name }} has already been started."
 return 1
 fi
 fi
echo -n "Starting {{ item.service_name }}"
nohup start-stop-daemon -c $prguser -n $prgname -p $pidfile -m --exec /usr/bin/env --start $prgcmd >> $logfile 2>&1 &
if [ $? -eq 0 ]; then
 echo "."
 return 0
 else
 echo "Failed to start {{ item.service_name }}."
 return 1
 fi
 }
stop() {
if [ ! -f $pidfile ]; then
 echo "{{ item.service_name }} not started."
 return 1
 fi
echo -n "Stopping {{ item.service_name }}."
start-stop-daemon -p $pidfile --stop
if [ $? -ne 0 ]; then
 echo "Failed to stop {{ item.service_name }}."
 return 1
 fi
rm $pidfile
 echo "."
 }
status() {
if [ -f $pidfile ]; then
 pid=`cat $pidfile`
 kill -0 $pid >& /dev/null
 if [ $? -eq 0 ]; then
 echo "{{ item.service_name }} running. (PID: ${pid})"
 return 0
 else
 echo "{{ item.service_name }} might have crashed. (PID: ${pid} file remains)"
 return 1
 fi
 else
 echo "{{ item.service_name }} not started."
 return 0
 fi
 }
restart() {
 stop
 if [ $? -ne 0 ]; then
 return 1
 fi
sleep 2
start
 return $?
 }
case "$1" in
 start | stop | status | restart)
 $1
 ;;
 *)
 echo "Usage: $0 {start|stop|status|restart}"
 exit 2
 esac
exit $?

We then wrote our ansible script and call the template task thus:

- name: Create the service init scripts
template: src=service_init_script.j2 dest=/etc/init.d/{{ item.service_name }} owner=root group=root mode=755
with_items:
- { service_name: paymentHandler, service_description: 'Handles payments', service_user: blackjack_attack, service_exec: "{{ application_directory }}/apps/paymentHandler.js" }
- { service_name: tablePool, service_description: 'The pool of tables available', service_user: blackjack_attack, service_exec: "{{ application_directory }}/apps/tablePool.js" }
- { service_name: userManager, service_description: 'Manages users', service_user: blackjack_attack, service_exec: "{{ application_directory }}/apps/userManager.js" }
sudo: yes

So, we are calling the same template task, which in turn refers to the same jinja2 template, passing in a collection of dictionary objects.

The key point to understand here is that when the playbook is run ansible enumerates the items specified referring to each object as ‘item’. This is the reason why we had to prefix the variable names in the template with item.*

I’ve exposed both of these pieces of code as gists here, and here.

Lessons Learned Working in a Service Company Servicing Large Projects

I was fortunate enough recently to attend LeanCamp, an unconference organised by Salim Virani and aimed at people interested in the Lean Startup meme. I was reminded the other day of one of the sessions I attended. “Lessons Learned Bootstrapping a Service Company” was run by Chris Parsons about his experiences in starting his company, Eden Development.

Of the conferences that I have been to recently, I’ve found myself gravitating towards sessions that take an experience report format more. I like them as they’re nothing more than observational. Whilst they do often focus on one subject in particular the people presenting are giving you an opportunity to listen to their experiences, good and bad of trying to do something one way or another and not just trying to sell you a mentality / ideology. I do find sometimes that they let themselves down in the respect that evidence provided of the effects is anecdotal instead of quantitative. Happily, this was a less formal session and so that wasn’t a concern.

At 7digital we’re starting to near the end of the first phase of a project that is large in comparison to the normal size of those that we do. To be specific, we’ve been involved in the project since June last year and have been actively working to deliver features for the client since early November. It has consumed at a minimum, 4 developers full time and for a couple of months, 8. As I suggest, not large necessarily in comparison to other projects within the industry, but large for us.

I was reminded of Chris’ session as I start to reflect on what I have learned in delivering this project. Chris outlined 5 key lessons in a slide deck and then spoke about them. I thought that I might use those headings as the basis for my own blog post in respect of what I have learned.

1. Learn How to Say No

Chris’ point in the presentation was that he wished that he’d learnt to say no more often to pieces of work that he took on in the formative stage of Eden.

I’m certainly not in a position to be able to make those decisions outright at 7digital nor for that matter would we as a company want to; we’re in a privileged position that allows us to work with some great clients on interesting pieces of work that suit all concerned.

In the case of this project in particular owing to the client’s incumbent delivery process, we’ve had to accept that we have to work in a way that is contrary to the way in which we’re happiest delivering software. In contrast to Chris’ point, I think what I’ve learnt or at least had reiterated to me, is that saying yes to something but in doing so asking why, is very powerful. On a number of occasions I’ve been able to ask why a practice is used and in doing so understand 2 things; the context that surrounded the practice and secondly, the way in which the practice is viewed. With those bits of information I’ve been able to bring some subtle but nonetheless tangible change to the project and hopefully have contributed to making it more successful as a result.

2. Hire Passion over Experience

I don’t have notes that have the specifics of Chris’ points here but remember agreeing wholeheartedly with him. I think he made the point that there was a correlation between those that were passionate about development and the breadth of their skill set and ability to pick up say, a new language.

There isn’t really anything within this point that I can speak about that isn’t only tenuously linked and so I am leaving comment in this section.

3. Spend Where the Value Is

On this point, Chris spoke about the fact that at Eden, rather than having invested in tailored office space from the outset as so many other startups do (along with presumably, micro scooters / space hoppers / foosball tables etc), they invested in things that mattered such as decent chairs and machines for their staff.

In deference to our typical approach to development, on this project we did a lot of up front analysis, some 2 or 3 months, which at the time felt a little uncomfortable. Reflecting upon this now I think we did about the right amount and instead of just documenting requirements we spent a lot of time challenging them in order to understand the problem that was attempting to be solved. We asked the client to give us ways in which we could measure success of the software we were to deliver and most importantly, we tried to understand what the smallest piece of work we could do for the client to suffice their requirement. This is not to say that we’ve done half a job, far from it. I think that whilst we were constrained to an extent by the project bias this piece of work took, we have attempted to embed an incremental approach where possible. I think that this effort was worth the investment of our time and money, we haven’t to date included the work done in any bill but it has paid for itself in my opinion. We ended up with a very discreet set of functionality that services a requirement and nothing more. We didn’t end up developing software that isn’t going to be used and the software that we have delivered is in the most part, measurable aiding our desire internally to automate all of our acceptance tests.

4. Cashflow

Here, Chris spoke about how having some savings as a company has helped Eden move from a position where they were chasing all types of work even that that was underpaid to one where they can afford to be a little more selective about the type of work or the clients that they work with.

For us in respect of this project, it’s certainly not work that we would have turned away. It has been a great piece of work to be involved in. My main lesson here has been that there is a hidden cost often not consider associated to development of this size in a department the size of ours. As I stated at the beginning of this article, this project has utilised up to 8 developers for a time which equates to roughly a third of the department. Because of the way the department is structured, we have 4 product teams and the fact that this work spanned 2 of those, it heightened the need for effective communication and the need for collaboration between the teams which has a cost associated with it. When considering work of this relative scale again, I will be sure to consider that fact.

There is also the matter that undertaking a large body of work such as this for us could have presented problems in our ability to undertake other smaller pieces of work; we might have missed out on opportunities elsewhere. Thankfully we haven’t, or at least we haven’t noticed any that we have.

5. Have a Higher Purpose

Finally, Chris spoke about the fact that in setting up Eden his desire was always to be building great software for clients. He went on to say that building a company is made up of a thousand small decisions and that it’s important to let the people who work for you to make those decisions though the only way that that would be successful is to have a clear vision that everybody knows and understands. He suggested that he has been known to “pop quiz” people within Eden on it.

In the Development Team here at 7digital we’re continually striving to produce software that is the right thing and is to a high standard of quality. From the moment that we set out on this endeavour we have strived aided by the Green Field nature of this work, for sure, to deliver against those aspirations and based on the measures that we have in place internally, I think we succeeded. One example that I would cite is this; we found that there was a misunderstanding owing to assumptions within everybody’s understanding of our acceptance criteria for one of the features. Unfortunately this wasn’t caught until fairly late on. Given that we had a suite of unit, integration and system level tests we were able to quickly understand the impact of this new requirement and go on to make the change. Having worked on many different projects throughout my career I’ve rarely before been afforded such confidence that a requested change would be successful and have so little impact. Anecdotal evidence I know, but enough for me in this case.

Not Time Boxing or Commitments but Managing Risk by Using SLAs

Recently, I’ve been working closely with one of the teams here and consequently have lead some changes to the dashboard that they keep for use in their planning sessions with stakeholders and for my benefit in being able to track costs associated to any client development we’re doing. Having been keeping the data in this team now for about a year it’s proved incredibly useful.

Owing to the nature of the stakeholders that provide direction in respect of the teams’ pipeline, we’ve often seen situations where new features have been prioritised over features that have been in the backlog for some time, something that is quite normal I’m sure. In turn though, this has then meant that when the team does come to start the work pertaining to the features that were entered in to the backlog first, the delivery date that was originally suggested to the external client will have already passed. In those cases, whilst the feature has been estimated using “T-Shirt” sizing we’ve only then tracked how long the team took to do the work. This is not a problem in itself, but a recent observation made by the team was that they felt that it didn’t necessarily help them make informed decisions in terms of how to deliver the feature.

A while ago, the team stopped using iterations opting for more of a flow based model, we’ve been implying a Work in Progress limit by restricting the amount of streams available to feature work but because of Hidden Work previously discussed, amongst other things, these often failed to focus the team in respect of delivery too.

In a recent prioritisation session with the stakeholders it was decided that we would focus efforts on improving the performance of a couple of the endpoints within an API.  It was decided that owing to the exploratory nature of the work, the developers undertaking the work would do a time boxed spiked.

It was at this point that I started seeing a possible way of getting the team to focus in on a delivery date. At the time, I thought that it would be reasonable to use the data relating to the estimates that we’d previously made to also time box future feature work. As can be seen in the image below, in the team’s dashboard we display the minimum, maximum, and standard deviation (variance) for all features within a T-Shirt size. Extending that  so that we used it to make a commitment to our clients / stakeholders that for example, anything deemed to be be a medium would be done in 19 days, the mean for that size.

Days Taken, Committed to Done by T-Shirt Size Including Variance

Thankfully, I realised the mistake I was making. I was suggesting that a team should take a piece of work which granted they had done some analysis on, and to make a commitment that they would deliver it in a set period of time. This is tantamount to asking a team to make a commitment at the beginning of an iteration to deliver a certain amount of functionality, based on whatever measure they are using at the time. To tend towards an iteration based delivery mentality is to tend towards one of two types of failure in my opinion: Under Commitment from the team so that they know they have more chance of meeting the expectations placed upon them; good for their moral but not great for the overall productivity. Alternatively, making a commitment which the team has no way of knowing that they can meet which can be demoralising for the team over time.

So What Instead Then?

The solution for us was born out of the fact that we keep so much data in relation to the features that we have delivered.

Recently, we have started to chart the data in a different manner to the one above, displaying the time taken for all features from when the team committed to doing them, to when they were released to the production environment (i.e. they were done).

All Features, Committed to Done

Visually, this is not necessarily an immediate improvement. The obvious thing that does now present itself though is just how much variance there is in the size of the features that the team is doing.

The chart is improved though when series displaying the Mean and Standard Deviation are added, see below:

All Features, Committed to Done with Mean and Std Deviations

However, it is still just a chart showing a lot of variation which when you consider all the features ever delivered by the team, you would expect and the Standard Deviation which needs no explanation. Some might be questioning what use the above is at all, well the benefit for us is in reinforcing the volatility and inherent risk in any estimate. This is complemented massively by the table below.

Table Showing Cycle Item by Type

Above, the information we use the most is Number of Items; the size of sample set, the Average Plus Standard Deviation; the average Cycle Time for an item with an allowance for some risk and the Percentage of Items Greater than the Average Plus Standard Deviation; the risk in using the number as the basis for any estimate.

When combined this data allows us to do one important thing for our stakeholders. In the absence of any deadlines we can say that we endeavour to deliver any item according to its type within a certain timeframe (arithmetic mean for the sample set + 1 standard deviation), hence the team are defining an SLA.

What’s important to note though here is that the SLA is not a commitment; it is information relevant to the team to enable them to focus on the risk that is always present in delivering software. These dates are being used currently by the the team in planning exercises as indicators and by putting the dates on the cards that go through the wall (using a post-it note on top of the card) to help them focus on delivery and furthermore and perhaps more importantly, to allow them to manage the risk on a daily basis. At the stand up when talking about cards they can focus specifically on how to increase their chances of delivering the feature within SLA.

It’s early days for this experiment and there isn’t enough data as yet to suggest that it is either succeeding or failing. Anecdotally though, it seems to be having a positive impact.

The Cost of Hidden Work

I’ve been working with one of the teams here recently all of whom have been feeling that the pace at which they deliver features is slower than they would like. I’ve run a number of sessions in order to gather data about the work they’ve been doing and how it flows through their wall. During one of the sessions, we focussed on the types of work that they are asked to do of which support was noted as one flavour.
Image of the Team's wall showing support streamThe team currently structure their board around the ongoing need to service this support need, see an image of their current implementation left. One of the 5 developers within the team will work solely within the Support Stream seen at the bottom of their wall for an undetermined amount of time.
The support items are typically raised through our issue management application and usually carry an expectation with them that they will be worked on immediately irrespective of their size. The team has tried implementing an SLA around response times in respect of support items before but for one reason or another (discussion of which is outside the scope of this article), it has never really taken hold (We have in the sessions discussed Classes of Service details of which will be covered in a subsequent post.).
Back to the team’s opinion that they weren’t delivering as fast as they might like though; in our discussions I stated that I felt that part of the reason was that in undertaking the support items they were undertaking a lot of what was effectively hidden work. It was work that was also implicitly being prioritised above all the other items that had previously been agreed as highest priority in a prioritisation session.
We didn’t manage to reach a conclusion on what to do about the support stream though and so the conversation continued outside of the meeting. One of the Developers in the team noted what he saw to be the pros and cons of the support stream which with his permission, I’ve noted below:

Pros

  • less distraction to the rest of the team
  • less confusion who’s looking into what
  • faster responses
  • fixed amount of resources dedicated to support – support cannot slow up feature streams
  • 1 person dealing with each support request = less overhead in understanding the problem

Cons

  • support rota not evenly distributed amongst team members
  • faster responses = less pressure on business to ask for automated solutions
  • pre-sale enquiries don’t fit well, so would still cause distraction
  • 1 person dealing with each support request = issues are hidden from rest of the team

I then followed up with a rather lengthy response which is the reason this post has emerged:

My issue with support is this; support items are essentially features that are being requested. In that respect, they should be prioritised among all the other features. The fact that there is a specific stream for support suggests that they are automatically accepted as being higher priority than the other features in progress and that is certainly not the case. Furthermore in prioritising them above the other features those other features are affected.

I think that it is far more prudent to prioritise the work you are doing on a daily basis, as a team and include the support items in that prioritisation. In my opinion, it is not so important that the rest of the team are not distracted as per the first point in the pros section above and that instead what is important is that you are as a team working on the highest priority items. Only then will you ever work together as a team to establish a flow. In that, I recognise that there is absolutely value in disrupting the flow of another item to get a major bug fixed, as there is also value in understanding why you are dealing with a support request and in the case of the why, you will only ever understand that by having a discussion with the other members of the team. I would hope that once you understand why as a team, you can then incorporate any learning in to an improvement cycle on a far more regular basis than waiting for your next retrospective at which time the item may have either been forgotten in the larger context or not voted as the highest priority thing to be attended to.

To the points above though; I’ve already touched on the first point in the pros section and as for the second point, I think that a clear definition of the role of the person acting floating manner will clear this up. On the third and forth points in respect of faster responses; as I’ve touched upon above, this may well be the case in respect of the items that you now classify as support but the net effects are that the other items are slower to deliver, I don’t have any data to substantiate this claim but I will point you at Little’s Law which suggests that the number of items in process is closely related to the overall time taken to process those items. On the fifth point, this is in direct conflict with our ambition to deal with one of the biggest problems we face as a department, that a select few that know more about our software and systems than anyone else; as well as the received opinion that two heads are better than one.

In terms of the cons listed above, the first point I may not understand fully but if I do understand it, surely a weekly system of having a floating role that fulfils a number of duties (which we can define as a group but for example: Monitoring the support queue, leading the stand up in the morning in discussing the blockers, the support requests that have come in and discussion about the work that is in progress.) would give a fair split of the responsibilities. I think I agree with the second point made but I would add this, the “business” (which is a term we should not allow to propagate lest we fall deeper in to a them and us mentality) classifying a support item as a feature and having a conversation about its priority and how to achieve it as a team which is what I am proposing, would I hope lead to other solutions being suggested and indeed the emergence of those automated systems bit by bit. If as a team you recognise that you have a little longer to undertake the item, schedule among the other things that you are doing, deliver it in a timely manner but put the building blocks in place for something that you can extend as time goes by. I’m in agreement with the fourth point about pre-sales enquiries as are we all I think, as we’ve now included it in the activities that we map on the board. Pre-Sales is a first class activity that everybody should have visibility of and also, should be able to undertake.

How Has This Concluded?

Firstly, the support stream hasn’t been removed from the team’s wall (as yet). They do now a rota in place though that clearly shows who is acting in that capacity. Furthermore and most importantly in my opinion, they now do prioritise the work within the context of the other features and don’t necessarily respond immediately but at a time that is convenient. They’ve also stopped tracking support items specifically opting instead to consider them features which are sized thus give us more reliable data about lead and cycle times and again and importantly actual visibility of the capacity the team has to do work.

Plan The Work But Don’t Work The Plan

I tweeted yesterday something that I had overheard to which a friend of mine, Toby then responded. As Toby then went on to suggest, the five tweets that I responded back with were a great analogy for why batching should be avoided (it had nothing to do with the 140 character limit, it was intentional, honestly.)

I thought that a blog post would allow me to better justify why I tweeted what I had overheard. My primary concern with the statement that was made is that in my opinion, batching all the requested items together to avoid somebody having to rework a plan multiple times slows down the delivery of the highest priority item the effect of which is to deliver value back to the person(s) that requested the work later.

Tying the team up on a larger bodies of work has further affects though. Firstly I think it means that it is less likely that a shorter feedback loop would exist which in turn, would mean that it would be easier for both the team and the person(s) requesting the work to not consider the smallest piece of functionality that could be delivered to meet the requirement. It starves the person responsible for prioritising the work of options to change, it asks for a commitment to deliver all requested items well in advance of their actual start date. Lean and Real Options would suggest deferring your commitment to the last responsible moment and as the InfoQ article I link to, I think this is at the core of being agile.

Working in very small batches isn’t without complication though; the concerns that I am suggesting in the above statements are born out of a bias towards a project based delivery mentality, the alternate to which is a product based one. In the environments that I have worked within it has always been difficult to group products in a meaningful way which in my experience has often lead to product owner groups all of whom need to compete to get their work done which has often ended up leading to frustration. From an engineering perspective, working in smaller increments also increases the need for two things: automated test suites that provide early visibility of breaking changes and also remove the strain on those people that fulfil a testing role in respect of doing the need to do full regression testing. Secondly, an automated release process is needed so that time spent doing releases is reduced.

I should be clear in all of this that I am not suggesting that planning does not add value. The value it brings though is in support of the actual delivery of the requested work. It’s biggest benefit is from a communication perspective and in that respect, it should be something that is recognised as being constantly out of date and therefore that it has a cost associated with it in keeping it up to date.

I mentioned flow specifically in one of my responses and that is point is worthy of expansion. If you imagine a magnet being passed above a bed of iron filings, a weak magnet would have less of an affect on the filings than a magnet with a stronger field would. Similarly small features individually passing through a team’s pipeline have less probability of causing a disruption. In relation to batching, Little’s Law suggests that the more items in a pipeline at any given time has the net effect of slowing (or to use the analogy, disrupting) the other items also in that pipeline. Finally, when you focus on individual features it makes it much easier in my opinion to understand where the bottlenecks are in your system Having the knowledge and then understanding why they exist is the first step in being able to focus efforts on  setting your pipeline up to work as effectively as possible with those bottlenecks.