Methodology and process post (Part III of Building culture in decentralized organizations)

tl;dr: “Culture” is mysterious and goopy, but valuable and important in decentralized organizations. By taxonomizing across the systems of people who build it for a living, we can help communities be more intentional about building shared meaning together.

I’m Prof. Seth Frey, a professor at UC Davis who brings large datasets to understand community building and self-governance online. I’ve been running a study of culture building practices among professional culture builders, both in traditional centralized organizations and also more informal communities and networks. In this post I’ll describe the work of the research (with leads to the raw materials) and some higher-level takeaways. If you just want a listicle to run off with, the fourth and fifth parts will serve you well. If you want to understand how those lists were formed, and what claims to truth they have or don’t, and what assumptions they are based on, and just how hard it is to do anything resembling science on these questions, this post will give you a good sense of how to read the others, grounded in a very specific definition of what we’re really going for when we’re going for “good culture”

With the background (overall approach, language, and landscape) from the interviews in Post #2 my team built a standard form and recruited several research students to read books by professional consultants who have worked across enough organizations to develop a general approach or philosophy. The rationale behind both is to use the words and writing of a person as a lens into their mental model. I used the results from that “book report” effort to extract practitioners’ similarities and generate recommendations. Much of the most important work is very subjective, and not exactly separable from me and how I am. Nevertheless I’m better than most about signposting the distortions I introduce into an analysis, and I’m open to your criticism, comments, concerns.

For most of this, when I say “I” I do mean “we”: portions of this work during periods of its development were performed by Taylor Ferrari (Private qualitative researcher focused on tech, particularly on the interviews and the development of our book report prompts; (webpage, Linkedin) and Beril Bulat (Ph.D. student at UC Davis, particularly managing the research assistants).

Literature selection

The first step was to build a corpus of mental models of professional culture builders for comparison. These were mostly consultants for traditional organization. I conducted a broad-based search for books about culture building in organizations, the criteria being that the author(s) work in/with organizations or communities to build culture as we define it, and have worked with enough organizations to have developed and published a general system. Here are the books we selected. We developed the list by drawing from leads out of our interviews, from recommendations by other personal contacts, by search for both book titles and names of authors (often well-known consultants whose names have come up repeatedly), and with some impromptu additions from Amazon recommendations, everything that fit the requirements. It is not a conventional choice in qualitative research or mapping exercises to favor “popular press” books over academic writing (at least not in my world). But given our goals—to gain insight into how experienced practitioners think about their work—this was the right choice.

The rationale behind our method was to use books by domain experts as a model or approximation of the interview process, and thereby simulate interviews with the most experienced and well-regarded practitioners in the culture building space. We treat the system reported in each book as a reflection of the mental model that they have developed in their previous work.

Of course, these books aren’t just lenses into mental models: they are sometimes self-help books and often marketing materials for the author’s consulting practice. This could be making the book less reliable, if the authors are describing what they think readers want to hear at the expense of describing their actual practice and approach. One (of many) assumptions we are boldly starting from is that, whatever the motivation, these books are sincere attempts to describe each authors actual systems and opinions.

Supporting systematic comparison

We then assigned the books to student volunteer research assistants along with a worksheet. We developed the worksheet based on the picture that we gained from the pre-interview stage, by iterating on our interview prompts. The worksheet and raw reports are here. It encoded certain assumptions that we converged on after the interviews, such as the guiding assumption that the systems reported by the books would be comparable to each other: that they describe “the same kind of work.” which we developed by iterating off of our interview prompts.

We instructed students to read the worksheet and continuously refer to it while reading. They were asked to stay close to the text in their answers (provide quotes), and to distill key concepts relating to each authors goals for an intervention (what qualities do they want the org to manifest) and approach (what tools or practices are they endorsing to help the org manifest those qualities)?

Students (all working full-time on their courses) had about 6 weeks to finish their books and submit their notes.

Systematic comparison

After this I went through the reports in a subjective process of picking out keywords and themes that seemed to reflect each person’s theory. This was a subjective “mind mapping” type process of creating provisional categories and merging and splitting them as I expanded to more books, or as I got the sense that some didn’t fit.

I found the inherent subjectivity of this process to be uncomfortable. One thing that redeems it is that there were some striking similarities across books. For all their differences, some things are quite stable across practitioners. Take for example these definitions of the word “culture” from four of the eleven books we reviewed:

  • “the shared assumptions, values, behaviors, and artifacts that determine and reflect “how and why we do things around here”” — Heskett “Win from Within”

  • “Our definition of culture, “assumptions, expectations, beliefs, social structures, or values guiding behavior,”” — Briody Trotter Meerwarth “Transforming Culture”

  • “A set of beliefs and resultant observable behaviors that determines—more than any other factor—the performance of the group.” — DeMarco “Happy to Work Here”

  • A culture is “a set of shared experiences, values, and goals that is unique to every… team observed” — Fitzpatrick and Collins-Sussman “Debugging Teams”

For a word as squishy as “culture”, these independent definitions reflect surprising agreement about what it is. I say “independent” a bit provocatively. Certainly they may have influenced each other. Everyone talks about classics like the “Fifth Discipline” in Biblical terms. Likely there is a standard definition of culture that each is riffing off of. If so, these definitions are dependent in the sense of coming from a common source. But even if they are not independent in the sense of being developed independently, they are still independent in the sense of being retained independently: their authors may have chosen to retain a secret standard set of definitions when they could have chosen otherwise. This is important to establish, because if we are treating each book as a data point, we need a credible way to claim that they are independent data points, not, for example, the result of 11 books by 11 pupils of a single influential thinker.

Making calls without fooling yourself

My primary training is in quantitative research (data science and behavioral experiments), not the qualitative methods that I based this work on (specifically free interpretation of interviews and other forms of self-report). To be honest, I find it really uncomfortable. Would another human looking at the same data come to similar conclusions? I won’t try to answer that, but I’m doing the next best thing: making it answerable to you if you care, by providing my sources at every step of the way. You are certain to trust some of my decisions more than others. Hopefully the worst you’ll be able to say of the method is “That’s a little shaky but I can see why he made that decision.” In general this work is not a demonstration of virtuoso qualitative investigation, but overall it was conducted at a level that it could be published in an academic venue. Instead of that, I will post here, freeing me to write less formally, and focus more on applications than theory.

Despite the risks and downsides of qualitative interpretation, I think that the comparative “mind mapping” process was worth doing. By distilling the qualities and practices that practitioners focus on, we can compare across books. By distilling common themes across books we can gain general insight.

Figure 1: Picture of full mind map

The internals, inputs, and outputs of “culture”

These people, with very different philosophies and backgrounds, and united only by experience across organizations, have all independently developed a sense of culture change. On what common threads have they converged? For example, four of the seven dimensions we extracted from this process are shared identity, shared behaviors, shared purpose, and shared values. Are those real? Are they distinct, or just synonyms, or hand-wavy stand-ins for a deeper unstateable je ne sais quois? Depending on context, some of these could be redundant (how independent are identity and values), while some, being more mutable than others (such as behaviors), may be seen less as results of culture building and more inputs to culture building. Should these be combined, thinned, or broken down into even more qualities?

Figure 2: First look at common dimensions of culture (developed in Post #4

Figure 3: First look at common tools of culture (developed in Post #5)

According to our analysis, one book distinguished all four as distinct concepts, four distinguished three of the four (all retained “values”), and five named just two of the four (Figure 2). Although this leaves many gaps, and evidence that many authors collapse these concepts, their choices of which to keep and which to drop are surprisingly even, with each of those four dimensions identified as an important dimension of culture by 6 or 7 out of 11 authors (more than half). Our takeaway is that this method, using trade books to perform a comparative approach of consultant mental models, can credibly used to identify shared concepts and identify common themes that experts agree on.

Limitations of the approach

This summary should also make it clear how far we are from an engineering level of certainty. Under our analysis, the qualities with the strongest support as general dimensions were only identified by a maximum of 60% of books. By an engineering standard, 60% agreement on a fundamental law is not good. You could say that this is a problem with the method (using popular books, or these popular books) or the researcher (me), but ultimately these are just hard things to study. They are squishy. By social science standards, 60% agreement is actually pretty impressive. In social science there’s a lot of lowering your bar for how solid you expect the most solid knowledge to be.

That said, I’ll be the first to admit that every step of this process had the potential to introduce error and bias. Here is an inexhaustive list of problems with our approach:

  • The procedure for building the book criteria may be arbitrary (I just came up with something)

  • The procedure for selecting the books may have introduced bias

  • … as may the procedure for building the prompts that undergrads filled out

  • … and the procedure for recruiting undergrads (we took who came needing units)

  • … as well as the procedure for deciding which 11 of the ~20 books to assign (lacking enough researchers to read all books, we gave out the shortest and best fitting books first)

  • and maybe not a great idea to assign two books by the same author.

  • Several undergrads clearly struggled to comply with the instructions (some of the reports are low quality)

  • The undergraduates likely had subtle differences between them in how they interpreted the prompts,

  • Or how expansive they were: do I have Heskett as covering more dimensions just because his report was a bit longer than the average?

  • There was obvious subjectivity in my process for translating reports to shared factors

  • Similarly, the processes of defining, filtering, and merging factors shared across all books was very subjective (read “objectively arbitrary”).

(This list is also a preview of my rhetorical strategy for addressing the criticisms and concerns that you end up coming up with; in a nutshell: “Anything you can fault, I can fault better.”)

Again, these problems are due in part to my acceptance that this process can’t be both rigorous (high inter-rater reliability) and successful (produce stable categories). So there’s a certain amount of casualness or self-forgiveness in my execution (though this likely meets the bar for publishable qualitative research in some venues). But the bigger factor in any problems is going to be the difficulty of the task of extracting social insights from a large number of informants.

The purpose of this post was to set up the upcoming posts by walking through the work and the rough shape it has taken. I’m sharing the work products and details of the method less because I expect anyone to replicate and more for clarity and transparency. I’ll be really energized by questions about the findings in the upcoming posts. In the meantime I’m looking forward to fielding your questions, comments, and concerns on the approach.

Thank you for supporting this work with your interest and engagement.

Seth Frey (Home , Twitter )


Thank you @enfascination for this amazing work on building culture in decentralized organization.

This cracked me up a bit.:grinning::grinning:

However, since several students clearly struggled to comply with instructions as specified in one of the limitations of approach:

Thus, resulting in low quality report. Don’t you think this may have affected the rate of books selected; where the qualities with the strongest support as general dimensions were only identified, hence, the 60%?

PS: Could see that this work is not entirely based on quality. Just having this thought playing around my head.


I agree that the low quality reporting probably led to underreporting rather than overreporting or misreporting. But it didn’t occur to me until you just pointed it out: underreporting in this scenario is the best of the three because it introduces conservative bias (rather than anticonservative bias or noise). This means that if I’m wrong as a result of bad student reports, I’m wrong in the sense of understating the results instead of overstating them. That’s generally good practice in social science, especially in today’s post-replication-crisis environment, where people are much more afraid of false positives (saying something is there that isn’t) than misses (saying something isn’t there that is).

So yes, there is a chance that the cross-author agreement in dimensions is greater than 60% and that I’m being conservative by pinning it there. Given the uncertainty in empirical work generally, and the inevitability of one of those three types of error (understatement, overstatement, and noise), I’m perfectly happy erring on the side of understatement. Great observation! Thanks a ton!

1 Like