Private key errors from Logstash when shipping logs over SSL using filebeat

We use the filebeat shipper to ship logs from our various servers, over to a centralised ELK server to allow people access to production logs.

Because the logs are shipped from various data centres, the filebeat shippers are configured to send logs using SSL. To achieve that, we followed the how to guide. For the curious, that means that each of our servers has a filebeat.yml configuration file containing the following excerpt,

output.logstash:
    hosts: ["hostname:5044"]
    ssl.certificate: "/etc/ssl/certs/our.crt"
    ssl.key: "/etc/ssl/private/our.key"

and the inputs config file on our ELK server contains the following,

input {
  beats {
    port => 5044
    ssl => true
    ssl_certificate => "/etc/ssl/certs/our.crt"
    ssl_key => "/etc/ssl/private/our.key"
  }
}

I upgraded Logstash this morning, from 6.2 to 6.4 and found that Logstash wouldn’t start. When I inspected the Logstash logs I found errors as follows:

[2018-08-30T10:58:50,842][ERROR][logstash.inputs.beats ] Looks like you either have a bad certificate, an invalid key or your private key was not in PKCS8 format.

I checked the filebeat shipper logs too, and found errors similar to the following:

2018-08-30T11:26:02Z ERR Connecting error publishing events (retrying): read tcp ip.address.here:45896->ip.address.here:5044: read: connection reset by peer

In actual fact, I needn’t have gone further than the error that I found in the Logstash error log which said, “your private key was not in PKCS8 format”.

The SSL key that we had shipped to the box was not in PKCS8 format, which according to the documentation is a requirement. The fix in the end was simple, and was provided in this comment against a github issue. I needed to create a PKCS8 version of our existing key. To do that, I ran the following command,

openssl pkcs8 -in /etc/ssl/private/our.key -topk8 -nocrypt -out /etc/ssl/private/our.p8

To use the new PKCS8 version of the key, I also needed to update the beats input file to use the new our.p8 key file and then restart Logstash. The revised config is as follows:

input {
  beats {
    port => 5044
    ssl => true
    ssl_certificate => "/etc/ssl/certs/our.crt"
    ssl_key => "/etc/ssl/private/our.p8"
  }
}

Hope this helps someone else!

Getting Image and Flavor information from the Rackspace API

At Gamevy we’re lucky enough to have been accepted onto the Rackspace UK Startup Programme. In combination with that, we’ve been using Ansible to deploy new servers, and perform application deployments.

When you create a new server at Rackspace using Ansible rax module there are two pieces of information you’ll need from Rackspace. One of those is the flavor, this defines the instance type (the configuration of the instance – e.g. how much RAM it is given). The second is the id of the image (operating system) you wish to use to create the server.

To date, we’ve not found an easy way to get this information. Our current method is to use cUrl, get an Access Token, and to then make further requests to the API to get the required information.

Given this is an infrequent task, I normally forget the necessary steps. I thought that by noting them down I might save myself, and others, some time in the future. This post assumes that you already have a Rackspace account, and that you have API access via a key which is associated with your account. If that’s not the case, Sign Up here, and then follow these steps to get your API key.

It’s worth noting that this post focuses on accessing functionality provided by Rackspace UK. If you’re accessing from the US, the steps are the same, but  you’ll need to change the URLs that are accessed. Full information about Rackspace’s API can be found at their documentation site.

Getting An Auth Token

First things first, let’s get an Auth Token. To do this, you’ll need a few pieces of information from your Rackspace account: your username, your API Key and your account number.

Login to your Cloud Control Panel, and in the top right hand corner of the screen you’ll see your username with an arrow pointing down next to it. Clicking on that will reveal a menu at the top of which will be your account number, note that down, you’ll need when you make calls to the API, later.

From the menu, select Account Settings. On the resulting screen, access your API secret and note that too.

Create a file, call it whatever you want, I called mine rackspace-creds.json on your machine (I put mine in my home directory, for ease of access). The file should contain the following:

{
    "auth":
    {
        "RAX-KSKEY:apiKeyCredentials":
        {
            "username": "YOUR_RACKSPACE_USERNAME",
            "apiKey": "YOUR_RACKSPACE_API_KEY"
        }
    }
}

To get your Auth Token, open up a terminal and issue the following command

curl https://identity.api.rackspacecloud.com/v2.0/tokens -X 'POST' \
-d @rackspace-creds.json -H "Content-Type: application/json" | python -m json.tool

I’ve piped the output of the cUrl command through python’s json tool to prettify it. In the response you receive, you’ll need to grab your :

"token": {
            "RAX-AUTH:authenticatedBy": [
                "APIKEY"
            ],
            "expires": "",
            "id": "YOUR_AUTH_TOKEN",
            "tenant": {
                "id": "",
                "name": ""
            }
}

List The Flavors And Images Available To You

Now that you’ve got an Auth Token, you can go ahead and make calls to the API endpoints for Flavors and Images. I’ve pasted the corresponding examples below:

Flavors

curl https://lon.servers.api.rackspacecloud.com/v2/YOUR_ACCOUNT_ID/flavors \
> -H "X-Auth-Token: YOUR_AUTH_TOKEN" | python -m json.tool

Images

curl https://lon.servers.api.rackspacecloud.com/v2/YOUR_ACCOUNT_ID/flavors \
> -H "X-Auth-Token: YOUR_AUTH_TOKEN" | python -m json.tool

 

Iterating an ansible task using a dictionary object of variables

Recently we have been writing ansible scripts to deploy one of our applications. In doing so, we came across the need to configure 3 node processes as services. All of these node processes used a similar init script but required different values writing into the script.

We chose to use the template task thinking that with the 4 variables we needed to populate on each occasion we should be able to use the with_items method on the task to do what we wanted. We did plenty of Googling and couldn’t find a way forward and so were all set to revert to writing three separate implementations of the template task, and three associated templates.

Asking a question on the ansible IRC channel eventually yielded an answer and I thought I would share our implementation.

We start with a generic template, service_init_script.j2. The script itself is not important to this blog post, you should note that way in which the ansible ( / jinja2) variables refer to item.foo – this is important, and was our initial stumbling block. Each time the task is iterated upon the items within the with_items block are referred to as an item:

#!/bin/bash
### BEGIN INIT INFO
 # Provides: {{ item.service_name }}
 # Required-Start: $all
 # Required-Stop: $all
 # Default-Start: 2 3 4 5
 # Default-Stop: 0 1 6
 # Short-Description: {{ item.service_description }}
 ### END INIT INFO
# Taken loosely from https://gist.github.com/tilfin/5004848
 prgcmd={{ item.service_exec }} # What gets executed?
 prgname={{ item.service_name }} # What's the name (used to ensure only one instance is running)
 prguser={{ item.service_user }} # Which user should be used
 pidfile=/var/run/{{ item.service_name }}.pid # Where should the pid file be stored?
 logfile=/var/log/{{ item.service_name }}.log
export BJA_CONFIG={{ bja_config }}
start() {
 if [ -f $pidfile ]; then
 pid=`cat $pidfile`
 kill -0 $pid >& /dev/null
 if [ $? -eq 0 ]; then
 echo "{{ item.service_name }} has already been started."
 return 1
 fi
 fi
echo -n "Starting {{ item.service_name }}"
nohup start-stop-daemon -c $prguser -n $prgname -p $pidfile -m --exec /usr/bin/env --start $prgcmd >> $logfile 2>&1 &
if [ $? -eq 0 ]; then
 echo "."
 return 0
 else
 echo "Failed to start {{ item.service_name }}."
 return 1
 fi
 }
stop() {
if [ ! -f $pidfile ]; then
 echo "{{ item.service_name }} not started."
 return 1
 fi
echo -n "Stopping {{ item.service_name }}."
start-stop-daemon -p $pidfile --stop
if [ $? -ne 0 ]; then
 echo "Failed to stop {{ item.service_name }}."
 return 1
 fi
rm $pidfile
 echo "."
 }
status() {
if [ -f $pidfile ]; then
 pid=`cat $pidfile`
 kill -0 $pid >& /dev/null
 if [ $? -eq 0 ]; then
 echo "{{ item.service_name }} running. (PID: ${pid})"
 return 0
 else
 echo "{{ item.service_name }} might have crashed. (PID: ${pid} file remains)"
 return 1
 fi
 else
 echo "{{ item.service_name }} not started."
 return 0
 fi
 }
restart() {
 stop
 if [ $? -ne 0 ]; then
 return 1
 fi
sleep 2
start
 return $?
 }
case "$1" in
 start | stop | status | restart)
 $1
 ;;
 *)
 echo "Usage: $0 {start|stop|status|restart}"
 exit 2
 esac
exit $?

We then wrote our ansible script and call the template task thus:

- name: Create the service init scripts
template: src=service_init_script.j2 dest=/etc/init.d/{{ item.service_name }} owner=root group=root mode=755
with_items:
- { service_name: paymentHandler, service_description: 'Handles payments', service_user: blackjack_attack, service_exec: "{{ application_directory }}/apps/paymentHandler.js" }
- { service_name: tablePool, service_description: 'The pool of tables available', service_user: blackjack_attack, service_exec: "{{ application_directory }}/apps/tablePool.js" }
- { service_name: userManager, service_description: 'Manages users', service_user: blackjack_attack, service_exec: "{{ application_directory }}/apps/userManager.js" }
sudo: yes

So, we are calling the same template task, which in turn refers to the same jinja2 template, passing in a collection of dictionary objects.

The key point to understand here is that when the playbook is run ansible enumerates the items specified referring to each object as ‘item’. This is the reason why we had to prefix the variable names in the template with item.*

I’ve exposed both of these pieces of code as gists here, and here.

Do we need clearer roles and responsibilities?

Why are people are still preaching that better, more clearly articulated roles and responsibilities will improve productivity? Provide a transparent environment where people have a clear understanding of what they’re working toward, and its value, and you’ll get better results.

As I write this, I’m sitting on a train travelling home. I’ve just been through the blog posts in my RSS feed. Three offer very similar advice about how to improve an individual’s productivity in the workplace. Their answer to this perennial conundrum is that we need to provide clearer objectives and roles and responsibilities on a daily basis. Given that the posts originate from people promoting Agile as a more enlightened way of working, I shake my head in wonder. Really, I do.

The value of roles

There are times when clear roles and responsibilities are essential – this post is not to suggest that their use be completely dropped. Consider the outcome of the incident now known as the Miracle On The Hudson; having just taken off from New York City’s LaGuardia Airport, a US Airways aeroplane, flight 1549, struck a flock of geese. Both engines failed. Some 3 minutes later, the Captain and his crew, a First Officer and 3 Flight Attendants, had performed an emergency landing on the Hudson River. In an air crash, speed, accuracy of trained response and calm are paramount. Although the First Officer had been flying at the time, the Captain took the controls, leaving the officer to go through a 3-page emergency procedure to restart the engines. The three flight attendants showed passengers how to brace themselves for impact and took control of evacuating the plane, even as it filled with water. The captain walked up and down the sinking aircraft to be sure no-one had been left behind and was the last person to leave. Each of the crew members was helped by their training and their knowledge of their specific, unique roles and responsibilities.

The difference for IT

The problem, is that few business teams – IT or otherwise – face the same kind of challenge as existed for the air crew. In IT, problems tends to require a creative response, one that cannot be considered in advance, which no training will prepare us for, but which must be invented anew each time. Strict roles and responsibilities don’t help with this. Indeed, they may hinder it because instead of thinking freely, people are prescribed. Control does not sit well with creativity, originality or motivation. Attempting to control an individual in a group setting reinforces a silo-based mentality to work. That means that the attempt to control an individual through a strictly defined role and associated responsibilities, affects the whole system. In my opinion it compels people to work within the boundaries that are set for them – intentionally or otherwise.

Do rigid role boundaries assist in speed?

In the air crash example, knowing who is responsible for each task helped speed. Proponents of clear roles and responsibilities suggest that decisions are made faster and are accepted with less confusion. Yet in my experience one of the negative results is a reduction in collaboration. The classic case of this, which I am sure we have all experienced, is somebody justifying their inaction with: “well, it’s not my responsibility”. I’ve known many cases where people fought NOT to take responsibility for an unpleasant job or fought over who TOOK responsibility for a prestigious job and the result was always that the product slowed or sometimes stalled as collaboration evaporated.

Individuals can always change the rules

So far I’ve focused on the work. But software development is ultimately a people-orientated endeavour. I was thinking about this as I sat with my son – a four-and-a-half year old, to do his numeracy homework. For some reason he was writing a lot of his numbers back to front. I kept on trying to correct him and at one point said to him “just write me five nines”. His response was to write the number 5 and 9. Perfectly, I might add. Some might say it showed intelligence. I’m inclined to say that he was being subversive. Either way, it was a proud Dad moment. But that’s one of the problems with the advice being offered: people don’t like being told what to do. They either do what they think is best at the time, or in a small amount of cases intentionally subvert the system.

Demotivating boundaries

When we’re children recognition from our dads and teachers motivates us, as we grow older the influence of our peers is what counts. I tend to perform best when I know I am there to make a defined contribution. To this end, a degree of definition of my role and responsibilities helps, but a rigid boundary is demotivating, because it constrains my ability to contribute as much as I could or would like to. This demotivation is exacerbated when people have to discover where their responsibilities lie. I’ve heard people refer to this as being knocked back – they’ve tried to do something which is perhaps of interest to them, or where they perceive there to be an opportunity, only to be told that their contribution wasn’t wanted.

Doing things differently

So what instead? As with all pernicious problems the first step is to observe what is happening. You might find people in a stand up talking about the piece of work they’re doing, rather than the work the team needs to do. Perhaps you see executives or departments becoming more rigid in role demarcation. A lack of vision or sense of purpose might pervade the organisation. Worst of all, perhaps no-one thinks making things better is any part of their ‘responsibility’. Cause and effect is difficult to establish in a software development scenario, but you should be able to state your observations and assumptions. Establish what effect an improvement might have. Test your assumptions with others – what’s their opinion? Don’t fall in to the trap that it is your role to change people. Remember, the way in which people behave is far more likely to be driven by the environment around them than anything else. Once there is agreement in the group about the perceived problem, consider what you might change to bring about improvements. Test whether these improvements have had the desired effect. Respond to the feedback you receive. Rinse and repeat.

I know that those advocating clarity of roles and responsibilities see the same problems in demotivation and wasted efforts as I do. The difference is that they believe more control is the answer, while I believe that less direct control frees people to create their own solutions. Those 5 crew members on US Airways flight 1549 worked as a team, each within their specialisms, and brought off “the most successful ditching in aviation history”[1]. Software development teams need to be more cross-functional in nature, but I believe that they too can perform their own miracles – as long as they have an environment that supports them rather than dividing them.

Thanks go to the inimitable Joshua James Arnold, Lady H of Edmonton (but of no fixed Twitter abode), and the sartorially precise Mark Krishan (50 shades of) Gray for helping me to improve this blog post.

[1] New York Post, 2009. Quiet Air Hero Is Captain America. [online] Available at: <http://www.nypost.com/p/news/regional/item_Goem4fAiUd2hsctASfAjGJ>. [Accessed 28 November 2012].

Using The Voice Of the Customer As Input For A Retrospective

Previously, I have worked with a team whose overall effectiveness has been called in to question by people who the team interfaced with. Upon further investigation it became clear that the perception those people had of the team was largely based on anecdote and yet, as the lead of the team said, perception is reality.

The lead and I sat to discuss this and agreed to substantiate some of the comments being made and the process that we would follow to derive actions that we could then take towards discernible improvements. I think that what we ended with is an interesting way of looking at gaining input in to a team’s retrospective and contrary to the way I have seen it done previously, namely that the team sourced opinion from outside of the team rather than internally, and so wanted to share it.

We started by identifying the customers of the team and given that this was the first instance of running this process, we also chose to keep the group size small, identifying just the key individuals with an agreement that in subsequent iterations, we would extend the group.

We designed a small survey comprising 4 questions that we would ask of the individuals:

  1. What is your perception of the team’s performance?
  2. On a scale of 1 to 10, how would you rate the performance of the team?
  3. Given the rating that you have just given the team, how could we make ourselves a 10?
  4. Can you suggest any ways in which we could measure the effect of the improvements that you suggest?

Questions 1 and 2 in the above were actually of little significance in respect of helping the team to improve, their purpose focused more on providing some context to the presentation that would be made back to the team after the interviews had been conducted.

Question 3 gave the person being interviewed an opportunity to make specific recommendations for how the team could improve and so in our opinion, was the most important. Less important then was question 4 though it did serve to make the people being interviewed question the feedback that they had in response to question 3 and to offer the team with some sense of measures that they could use that would be meaningful to their customers.

With this data, the team lead conducted a Retrospective. It started with a presentation of the information that had been collected. The team was told from the outset that this was data that had been collected and that while they may disagree with some of the comments made, it was the perception of others and that therefore they needed to accept it as it was.

The remainder of the Retrospective was split in to three sections, an idea generation session titled “What Changes Can We Make That Will Result In Improvement?”, followed by an associated filtering session which looked at the ideas generated and asked the question “How Will We Know That The Change Is An Improvement?”. The purpose of this filtering session was to filter out those ideas that could not be measured one way or another. The team estimated the value an item would have in respect of impacting their performance using a relative scale and followed that by estimating the effort involved in bringing about the change. Both of these estimates were done using a relative points scale. Finally, the value estimate was then divided by the effort estimate to give an indication of the Return on Investment and the item with the highest return was selected for action by the team.

Experimentation: A Pattern for Affecting Measured Change

Note: This post is based almost entirely on one of the things that I took away having attended a talk titled “Agile Coaching Tips” by Rachel Davies held at Skills Matter but perhaps overlooked at the time. As such, the originating ideas are not mine and instead, Rachel’s. This post is more something that I have constructed as a result of my own recent thinking, primarily driven by my interest in the kanban community and talking with people like Benjamin Mitchell which has led to me growing more appreciative of the value of qualitative and quantitative measures. This in turn has led me to revisit one of the subjects that Rachel introduced in the talk she gave, “Making Change As An Experiment” (See between 38:36 and 42:09 in the video I have linked to.)

Why Experiment?

Rachel proposed, and I agree with her, that talking about change with somebody can often result in them becoming wary whereas using nomenclature, in this case talking of experimenting with something different, instead of using the word change can make those people involved and or potentially affected feel more comfortable. In that respect, I like the idea of using experimentation as a means to effect change within what I do as primarily, it is considering the feelings of the people that are involved in that change up front. So often in my experience that not insignificant factor can be overlooked.

Secondly, when thinking about experiments that are run in a laboratory, it is accepted that you may fail to meet your objective; you’re questioning whether you can change some thing’s existing state. In relation to that. because you had a hypothesis that were looking to prove or disprove, you will have had to measure the outcome of any actions you performed and the conditions they were attempted under.

With those 2 points in mind, I’ve taken Rachel’s original idea and noted how I think it could be used as a pattern to bring about measured improvements.

Experimentation As A Pattern

1. Understand the Current State

The first step in my opinion, is to understand what it is you are attempting to achieve overall (your vision). From there, the next step is to understand the first step (perhaps the first impediment you want to remove or the first improvement you want to make otherwise) you want to take you towards your vision. Techniques such as Retrospectives, The 5 Whys and A3 Thinking amongst others, can help here but above all, the one tool that I keep coming back to currently is Force Field Analysis.

2. Hypothesise

The received wisdom is that hypothesising should be the process that is used to understand the goals of the experiment with which I would agree. Important to me here though is that the goal should be something that can be clearly articulated to anyone, i.e. in the case of a development team, those outside of the team from a non technical background too. It is possibly worth noting the goals too, this is not something that I have personally done but I do see that it might have brought value to previous change efforts.

It is particularly key here that if the experiment you wish to run is related to a practice within a team, everyone within the team understands why you are wanting to introduce a new practice or change an existing one. Furthermore, it is important in my opinion that everyone in the team has an opportunity to discuss how you might change. This is important so that everyone is in a position to question the validity of their own and that of their peers approach on a daily basis throughout the experiment. In my opinion, it is for all those involved in the experiment to ensure that it runs as defined.

3. Plan The Experiment / Decide On How To Measure Effect Of Change

The idea that any change in practices or process should be measured is, as I describe above, one of the most important aspects of why I think that experimentation is a useful way of looking to introduce change. Based on the findings from Understanding the Current State and the goals derived from the work done whilst Hypothesising you should be able to define either quantitative or qualitative set of measures. It would be ideal in most cases if you were able to identify measures that you already have in place so that you can compare and contrast, but not essential. In the absence of existing measures, I would suggest that it is worth extending the duration of the experiment so that you can observe change over time.

By way of a simple example, let’s imagine that you or the team you work within notice that features are batching when they get to the tester and perhaps even that in a lot of circumstances, the features end up having to be reworked. The overall affect of this is that the team is unable to concentrate on other work and that the features themselves take longer to be delivered unnecessarily. Some measures you might consider are: The amount of re-work, for example the number of times that a feature is passed between a developer and a tester; the number of bugs raised against a feature whilst it is being developed and finally the Cycle Time for features.

Finally, decide how long the experiment will run for. As I mention above, be mindful of how long you will need to gather data that will be statistically significant. If you already have a good set of data to compare any new metrics with then you can (though not necessarily should) use a shorter period of time and conversely, I would suggest that in the absence of any data to start with you should consider leaving the experiment to run for as long as possible.

4. Run the Experiment

As I’ve suggested in the sections above, I think the most important things to be concerned with whilst running the experiment are as follows: Have a clear vision for what you’re trying to achieve and be able to articulate a set of discreet measurable goals for the experiment. Ensure that, on a best endeavours basis (acknowledging that some people will always be difficult to be persuaded of the benefits of one practice over another. See Diffusion of Innovation, the Five Stages of the Adoption Process) everybody in the team is signed up to the experiment and know that they have a role to play within the experiment; time box the experiment so that those that aren’t necessarily persuaded of its virtues know that it is for a set time and furthermore to ensure that the amount of energy expended on an invalid hypothesis is minimised; take time at regular intervals to ensure that you remain true to the practices or process improvements that you are trying to bring about, without doing this it will invalidate the experiment.

Finally, at the end of the experiment take another opportunity to review what happened throughout the experiment, not just from the perspective of the measures that you have in place but from the perspective of those involved too; how did everyone involved feel the experiment went and what benefits did they feel it brought? In addition, my advice would be to be as transparent with all affected parties as possible with the measures that you are using and the results that you publish. I think this is important if for no other reason than it might generate further discussion, particularly outside of the team.

5. Iterate

It’s important to recognise that what is learnt from the experiment is the most important thing, as opposed to whether or not you succeed in meeting your goals. If you successfully met your hypothesis that’s great, but if you didn’t, ask yourself whether there is still value in what you are trying to achieve? If there is, can you change your previously defined practices to derive the value you are hoping to? Importantly too, are the measures that you defined still valid, are there ones that could be replaced with others, alternatively are there any that can be added to allow you to better understand performance against your goals?

In Conclusion

I am sure that the using experimentation as a pattern for change is something that others including Rachel have suggested before and so what I am proposing is nothing new, it is also without doubt that making change using the technique that I suggest adds overhead to the change process itself. I do firmly believe though that using experimentation as a technique to introduce change is beneficial owing to its focus on using measurement which allows those involved to be objective in their appraisal of any impact and that lastly, it considers the people involved up front thinking about the fact that when confronted with it, most people are wary of change.

Output from the Session I Proposed at the UK Agile Coaches Gathering, 2010

At the recent UK Agile Coaches Gathering (held at the magnificent Bletchley Park), the theme for which was Helping People Grow,  I proposed a session titled “Leaving a Legacy, How Do You Leave an Environment in Which a Team Can Continue to Grow” for one of the slots during the Open Space. There follows a summary of the session and then the notes that I captured, augmented from memory where possible.

To start I thought that I should provide an explanation as to why I proposed the session. In recent months, I have heard an awful lot of stories about change efforts that have been fantastically successful but when one thing is changed, perhaps a person leaving an organisation or the structure of the team changed, the momentum from that effort goes too and in some circumstances then leads to those involved moving backwards to their previous state. Furthermore and whilst I have never held the title of Agile Coach, from personal experience there seems to be a ceiling within organisations which when reached by a team, change becomes more difficult; typically because they are needing to challenge hierarchies or the received wisdom within that organisation.

Prior to the session, I did some preparation and thought about how as a coach you ensure you leave a legacy I came up with the following:

As a coach, you need to help a team learn how to change but that at the same time, you need to help an organisation learn how to allow that team to change

I then specifically listed a couple of points for each of Team and Organisation that I thought would ensure a legacy of a culture of continual change. From a Team perspective, I noted that I thought primarily they need to be taught how to be introspective and secondly and linked to the the first point, you need to also provide them with a tool set to facilitate their continued change. Form an Organisation’s perspective, I noted that you would need to help them understand their purpose, to help them understand the value they want from that team and lastly, how to communicate both those things to the team.

Once I’d set the scene with the above, we moved in to the discussion the notes of which follow:

  • The team need to get to a point where they have the courage to stand up for what they believe in
  • A coach should not become too embedded within a team
    • Doing this reduces the team’s reliance on the coach
  • Internal learning within a team leads to a team encountering blockers outside the team
  • Teams can be too busy building stuff to have a vested interest in removing the impediments outside their team
    • The above is most apparent in Scrum when the Scrum Master is delivery focussed (e.g. a Project Manager or Lead Developer) as opposed to being focussed on Process Improvement
  • It is key that the team needs to feel as though they own the process
    • Process Smell: 1 person owning the process
  • A lot of the examples given by teh attendees centred around organisations that were hostile to any change in the first place
  • Even if you have somebody that really cares about improving the team, people, including those outside the team need to experience the benefit before they will actually assist in further improvement
  • (Team reaching a ceiling) Awful lot of organisations that can’t articulate their vision
    • Means that they struggle to say no to additional work load as a consequence meaning that the org is just busy, no slack
    • No tools to resolve conflict
  • It was noted that is is harder to help an organisation be able to define and communicate their values when they have existed for a long time
  • Is the answer, in terms of leaving a legacy that you leave a team that have a clear set of values that they believe in?
  • To test whether in a coach’s absence the team is still being successful the following might be considered to be good indicators:
    • Are they still working as a team
    • Are they changing things
    • Are they still focussing on quality, internal and external
    • Are they learning
    • Are they innovating
    • Can it still be said that they have confidence
    • It was noted that the tests above are valid at an organisational level too
      • e.g. Kanban boards spreading outside of IT departments
  • Differentiating factor? If teams or the organisation is stressed?
  • See image below, showing how trust of a team is derived by others, thanks to Rachel Davies
    • Reciprocal though is that the others need to know that you as a coach know what you’re doing
  • We should accept that change efforts are incredibly fragile. e.g. When a new leader enters an organisation they will always want to make their own change
    • Credibility comes from evidence: gut feel is not enough
  • Is transparency part of the answer?
  • Are the goals of the team and the organisation aligned? If not, changes in practices will never flourish
  • Is the thing that is missing conversation? In a hierarchical structure is the ability to hold conversations across all levels there?

Image: Trust Equation

Thanks to all the participants who came along to the session and made it what it was, I enjoyed it and learnt something too, hopefully the same can be said for yourselves.

 

Lessons Learned Working in a Service Company Servicing Large Projects

I was fortunate enough recently to attend LeanCamp, an unconference organised by Salim Virani and aimed at people interested in the Lean Startup meme. I was reminded the other day of one of the sessions I attended. “Lessons Learned Bootstrapping a Service Company” was run by Chris Parsons about his experiences in starting his company, Eden Development.

Of the conferences that I have been to recently, I’ve found myself gravitating towards sessions that take an experience report format more. I like them as they’re nothing more than observational. Whilst they do often focus on one subject in particular the people presenting are giving you an opportunity to listen to their experiences, good and bad of trying to do something one way or another and not just trying to sell you a mentality / ideology. I do find sometimes that they let themselves down in the respect that evidence provided of the effects is anecdotal instead of quantitative. Happily, this was a less formal session and so that wasn’t a concern.

At 7digital we’re starting to near the end of the first phase of a project that is large in comparison to the normal size of those that we do. To be specific, we’ve been involved in the project since June last year and have been actively working to deliver features for the client since early November. It has consumed at a minimum, 4 developers full time and for a couple of months, 8. As I suggest, not large necessarily in comparison to other projects within the industry, but large for us.

I was reminded of Chris’ session as I start to reflect on what I have learned in delivering this project. Chris outlined 5 key lessons in a slide deck and then spoke about them. I thought that I might use those headings as the basis for my own blog post in respect of what I have learned.

1. Learn How to Say No

Chris’ point in the presentation was that he wished that he’d learnt to say no more often to pieces of work that he took on in the formative stage of Eden.

I’m certainly not in a position to be able to make those decisions outright at 7digital nor for that matter would we as a company want to; we’re in a privileged position that allows us to work with some great clients on interesting pieces of work that suit all concerned.

In the case of this project in particular owing to the client’s incumbent delivery process, we’ve had to accept that we have to work in a way that is contrary to the way in which we’re happiest delivering software. In contrast to Chris’ point, I think what I’ve learnt or at least had reiterated to me, is that saying yes to something but in doing so asking why, is very powerful. On a number of occasions I’ve been able to ask why a practice is used and in doing so understand 2 things; the context that surrounded the practice and secondly, the way in which the practice is viewed. With those bits of information I’ve been able to bring some subtle but nonetheless tangible change to the project and hopefully have contributed to making it more successful as a result.

2. Hire Passion over Experience

I don’t have notes that have the specifics of Chris’ points here but remember agreeing wholeheartedly with him. I think he made the point that there was a correlation between those that were passionate about development and the breadth of their skill set and ability to pick up say, a new language.

There isn’t really anything within this point that I can speak about that isn’t only tenuously linked and so I am leaving comment in this section.

3. Spend Where the Value Is

On this point, Chris spoke about the fact that at Eden, rather than having invested in tailored office space from the outset as so many other startups do (along with presumably, micro scooters / space hoppers / foosball tables etc), they invested in things that mattered such as decent chairs and machines for their staff.

In deference to our typical approach to development, on this project we did a lot of up front analysis, some 2 or 3 months, which at the time felt a little uncomfortable. Reflecting upon this now I think we did about the right amount and instead of just documenting requirements we spent a lot of time challenging them in order to understand the problem that was attempting to be solved. We asked the client to give us ways in which we could measure success of the software we were to deliver and most importantly, we tried to understand what the smallest piece of work we could do for the client to suffice their requirement. This is not to say that we’ve done half a job, far from it. I think that whilst we were constrained to an extent by the project bias this piece of work took, we have attempted to embed an incremental approach where possible. I think that this effort was worth the investment of our time and money, we haven’t to date included the work done in any bill but it has paid for itself in my opinion. We ended up with a very discreet set of functionality that services a requirement and nothing more. We didn’t end up developing software that isn’t going to be used and the software that we have delivered is in the most part, measurable aiding our desire internally to automate all of our acceptance tests.

4. Cashflow

Here, Chris spoke about how having some savings as a company has helped Eden move from a position where they were chasing all types of work even that that was underpaid to one where they can afford to be a little more selective about the type of work or the clients that they work with.

For us in respect of this project, it’s certainly not work that we would have turned away. It has been a great piece of work to be involved in. My main lesson here has been that there is a hidden cost often not consider associated to development of this size in a department the size of ours. As I stated at the beginning of this article, this project has utilised up to 8 developers for a time which equates to roughly a third of the department. Because of the way the department is structured, we have 4 product teams and the fact that this work spanned 2 of those, it heightened the need for effective communication and the need for collaboration between the teams which has a cost associated with it. When considering work of this relative scale again, I will be sure to consider that fact.

There is also the matter that undertaking a large body of work such as this for us could have presented problems in our ability to undertake other smaller pieces of work; we might have missed out on opportunities elsewhere. Thankfully we haven’t, or at least we haven’t noticed any that we have.

5. Have a Higher Purpose

Finally, Chris spoke about the fact that in setting up Eden his desire was always to be building great software for clients. He went on to say that building a company is made up of a thousand small decisions and that it’s important to let the people who work for you to make those decisions though the only way that that would be successful is to have a clear vision that everybody knows and understands. He suggested that he has been known to “pop quiz” people within Eden on it.

In the Development Team here at 7digital we’re continually striving to produce software that is the right thing and is to a high standard of quality. From the moment that we set out on this endeavour we have strived aided by the Green Field nature of this work, for sure, to deliver against those aspirations and based on the measures that we have in place internally, I think we succeeded. One example that I would cite is this; we found that there was a misunderstanding owing to assumptions within everybody’s understanding of our acceptance criteria for one of the features. Unfortunately this wasn’t caught until fairly late on. Given that we had a suite of unit, integration and system level tests we were able to quickly understand the impact of this new requirement and go on to make the change. Having worked on many different projects throughout my career I’ve rarely before been afforded such confidence that a requested change would be successful and have so little impact. Anecdotal evidence I know, but enough for me in this case.

Not Time Boxing or Commitments but Managing Risk by Using SLAs

Recently, I’ve been working closely with one of the teams here and consequently have lead some changes to the dashboard that they keep for use in their planning sessions with stakeholders and for my benefit in being able to track costs associated to any client development we’re doing. Having been keeping the data in this team now for about a year it’s proved incredibly useful.

Owing to the nature of the stakeholders that provide direction in respect of the teams’ pipeline, we’ve often seen situations where new features have been prioritised over features that have been in the backlog for some time, something that is quite normal I’m sure. In turn though, this has then meant that when the team does come to start the work pertaining to the features that were entered in to the backlog first, the delivery date that was originally suggested to the external client will have already passed. In those cases, whilst the feature has been estimated using “T-Shirt” sizing we’ve only then tracked how long the team took to do the work. This is not a problem in itself, but a recent observation made by the team was that they felt that it didn’t necessarily help them make informed decisions in terms of how to deliver the feature.

A while ago, the team stopped using iterations opting for more of a flow based model, we’ve been implying a Work in Progress limit by restricting the amount of streams available to feature work but because of Hidden Work previously discussed, amongst other things, these often failed to focus the team in respect of delivery too.

In a recent prioritisation session with the stakeholders it was decided that we would focus efforts on improving the performance of a couple of the endpoints within an API.  It was decided that owing to the exploratory nature of the work, the developers undertaking the work would do a time boxed spiked.

It was at this point that I started seeing a possible way of getting the team to focus in on a delivery date. At the time, I thought that it would be reasonable to use the data relating to the estimates that we’d previously made to also time box future feature work. As can be seen in the image below, in the team’s dashboard we display the minimum, maximum, and standard deviation (variance) for all features within a T-Shirt size. Extending that  so that we used it to make a commitment to our clients / stakeholders that for example, anything deemed to be be a medium would be done in 19 days, the mean for that size.

Days Taken, Committed to Done by T-Shirt Size Including Variance

Thankfully, I realised the mistake I was making. I was suggesting that a team should take a piece of work which granted they had done some analysis on, and to make a commitment that they would deliver it in a set period of time. This is tantamount to asking a team to make a commitment at the beginning of an iteration to deliver a certain amount of functionality, based on whatever measure they are using at the time. To tend towards an iteration based delivery mentality is to tend towards one of two types of failure in my opinion: Under Commitment from the team so that they know they have more chance of meeting the expectations placed upon them; good for their moral but not great for the overall productivity. Alternatively, making a commitment which the team has no way of knowing that they can meet which can be demoralising for the team over time.

So What Instead Then?

The solution for us was born out of the fact that we keep so much data in relation to the features that we have delivered.

Recently, we have started to chart the data in a different manner to the one above, displaying the time taken for all features from when the team committed to doing them, to when they were released to the production environment (i.e. they were done).

All Features, Committed to Done

Visually, this is not necessarily an immediate improvement. The obvious thing that does now present itself though is just how much variance there is in the size of the features that the team is doing.

The chart is improved though when series displaying the Mean and Standard Deviation are added, see below:

All Features, Committed to Done with Mean and Std Deviations

However, it is still just a chart showing a lot of variation which when you consider all the features ever delivered by the team, you would expect and the Standard Deviation which needs no explanation. Some might be questioning what use the above is at all, well the benefit for us is in reinforcing the volatility and inherent risk in any estimate. This is complemented massively by the table below.

Table Showing Cycle Item by Type

Above, the information we use the most is Number of Items; the size of sample set, the Average Plus Standard Deviation; the average Cycle Time for an item with an allowance for some risk and the Percentage of Items Greater than the Average Plus Standard Deviation; the risk in using the number as the basis for any estimate.

When combined this data allows us to do one important thing for our stakeholders. In the absence of any deadlines we can say that we endeavour to deliver any item according to its type within a certain timeframe (arithmetic mean for the sample set + 1 standard deviation), hence the team are defining an SLA.

What’s important to note though here is that the SLA is not a commitment; it is information relevant to the team to enable them to focus on the risk that is always present in delivering software. These dates are being used currently by the the team in planning exercises as indicators and by putting the dates on the cards that go through the wall (using a post-it note on top of the card) to help them focus on delivery and furthermore and perhaps more importantly, to allow them to manage the risk on a daily basis. At the stand up when talking about cards they can focus specifically on how to increase their chances of delivering the feature within SLA.

It’s early days for this experiment and there isn’t enough data as yet to suggest that it is either succeeding or failing. Anecdotally though, it seems to be having a positive impact.

The Cost of Hidden Work

I’ve been working with one of the teams here recently all of whom have been feeling that the pace at which they deliver features is slower than they would like. I’ve run a number of sessions in order to gather data about the work they’ve been doing and how it flows through their wall. During one of the sessions, we focussed on the types of work that they are asked to do of which support was noted as one flavour.
Image of the Team's wall showing support streamThe team currently structure their board around the ongoing need to service this support need, see an image of their current implementation left. One of the 5 developers within the team will work solely within the Support Stream seen at the bottom of their wall for an undetermined amount of time.
The support items are typically raised through our issue management application and usually carry an expectation with them that they will be worked on immediately irrespective of their size. The team has tried implementing an SLA around response times in respect of support items before but for one reason or another (discussion of which is outside the scope of this article), it has never really taken hold (We have in the sessions discussed Classes of Service details of which will be covered in a subsequent post.).
Back to the team’s opinion that they weren’t delivering as fast as they might like though; in our discussions I stated that I felt that part of the reason was that in undertaking the support items they were undertaking a lot of what was effectively hidden work. It was work that was also implicitly being prioritised above all the other items that had previously been agreed as highest priority in a prioritisation session.
We didn’t manage to reach a conclusion on what to do about the support stream though and so the conversation continued outside of the meeting. One of the Developers in the team noted what he saw to be the pros and cons of the support stream which with his permission, I’ve noted below:

Pros

  • less distraction to the rest of the team
  • less confusion who’s looking into what
  • faster responses
  • fixed amount of resources dedicated to support – support cannot slow up feature streams
  • 1 person dealing with each support request = less overhead in understanding the problem

Cons

  • support rota not evenly distributed amongst team members
  • faster responses = less pressure on business to ask for automated solutions
  • pre-sale enquiries don’t fit well, so would still cause distraction
  • 1 person dealing with each support request = issues are hidden from rest of the team

I then followed up with a rather lengthy response which is the reason this post has emerged:

My issue with support is this; support items are essentially features that are being requested. In that respect, they should be prioritised among all the other features. The fact that there is a specific stream for support suggests that they are automatically accepted as being higher priority than the other features in progress and that is certainly not the case. Furthermore in prioritising them above the other features those other features are affected.

I think that it is far more prudent to prioritise the work you are doing on a daily basis, as a team and include the support items in that prioritisation. In my opinion, it is not so important that the rest of the team are not distracted as per the first point in the pros section above and that instead what is important is that you are as a team working on the highest priority items. Only then will you ever work together as a team to establish a flow. In that, I recognise that there is absolutely value in disrupting the flow of another item to get a major bug fixed, as there is also value in understanding why you are dealing with a support request and in the case of the why, you will only ever understand that by having a discussion with the other members of the team. I would hope that once you understand why as a team, you can then incorporate any learning in to an improvement cycle on a far more regular basis than waiting for your next retrospective at which time the item may have either been forgotten in the larger context or not voted as the highest priority thing to be attended to.

To the points above though; I’ve already touched on the first point in the pros section and as for the second point, I think that a clear definition of the role of the person acting floating manner will clear this up. On the third and forth points in respect of faster responses; as I’ve touched upon above, this may well be the case in respect of the items that you now classify as support but the net effects are that the other items are slower to deliver, I don’t have any data to substantiate this claim but I will point you at Little’s Law which suggests that the number of items in process is closely related to the overall time taken to process those items. On the fifth point, this is in direct conflict with our ambition to deal with one of the biggest problems we face as a department, that a select few that know more about our software and systems than anyone else; as well as the received opinion that two heads are better than one.

In terms of the cons listed above, the first point I may not understand fully but if I do understand it, surely a weekly system of having a floating role that fulfils a number of duties (which we can define as a group but for example: Monitoring the support queue, leading the stand up in the morning in discussing the blockers, the support requests that have come in and discussion about the work that is in progress.) would give a fair split of the responsibilities. I think I agree with the second point made but I would add this, the “business” (which is a term we should not allow to propagate lest we fall deeper in to a them and us mentality) classifying a support item as a feature and having a conversation about its priority and how to achieve it as a team which is what I am proposing, would I hope lead to other solutions being suggested and indeed the emergence of those automated systems bit by bit. If as a team you recognise that you have a little longer to undertake the item, schedule among the other things that you are doing, deliver it in a timely manner but put the building blocks in place for something that you can extend as time goes by. I’m in agreement with the fourth point about pre-sales enquiries as are we all I think, as we’ve now included it in the activities that we map on the board. Pre-Sales is a first class activity that everybody should have visibility of and also, should be able to undertake.

How Has This Concluded?

Firstly, the support stream hasn’t been removed from the team’s wall (as yet). They do now a rota in place though that clearly shows who is acting in that capacity. Furthermore and most importantly in my opinion, they now do prioritise the work within the context of the other features and don’t necessarily respond immediately but at a time that is convenient. They’ve also stopped tracking support items specifically opting instead to consider them features which are sized thus give us more reliable data about lead and cycle times and again and importantly actual visibility of the capacity the team has to do work.