Social surveys in the developing world

Robert Chambers, sociologist of development, writing about social science surveys in the developing world:

As data collection is completed, processing begins. Coding, punching and some simple programming present formidable problems. Consistency checks are too much to contemplate. Funds begin to run out because the costs of this stage have been underestimated. Reports are due before data are ready. There has been an overkill in data collection; there is enough information for a dozen Ph.D. theses but no one to use it. Much of the material remains unprocessed, or if processed, unanalysed, or if analysed, not written-up, or if written-up, not read, or if read, not remembered, or if remembered, not used or acted upon. Only a minuscule proportion, if any, of the findings affect policy and they are usually a few simple totals. These totals have often been identified early on through physical counting of questionnaires or coding sheets and communicated verbally, independently of the main data processing.”

Robert Chambers [1983]: Rural Development: Putting the Last First. London, UK: Longman. p. 53.

A salute to Flo Skelly

Watching Season 2 of Mad Men with its arc of the rise of a female copywriter (Peggy Olsen, played by Elisabeth Moss), I was reminded of that real pioneer woman in advertising, Florence Skelly, who died in 1998 aged 73.  I never had the good fortune to work with her, but I have worked with lots of people who did.  The stories about her were legion.    I recall especially hearing about a series of detailed presentations she gave in the mid-1990s on the attitudes and aspirations of teenagers — those in what we would now call late GenX and early GenY — a group she seemed to know better than any other researcher around.   The irony was that she herself was at the cusp of her eighth decade!
Interestingly, season 1 of Mad Men had a couple of scenes involving market researchers, but the one woman was a PhD psychologist with a Central European accent, apparently unable to be creative and clearly instantiating a different (albeit then-common) archetype to Flo Skelly.
On Mad Men,  a reminder that Ta-Nehisi Coates, mashing Karl Rove, last October captured the demographic of the typical viewer with great precision:

Even if I’ve never met you, I know you all. You guys are that dude at the country club with the beautiful date, holding a martini and a cigarette, standing against the wall and making snide comments about all the CSI-viewers who pass by. And you’re also a Muslim. Can’t forget Muslim.

Chicago – this is your moment, too

The election of Senator Barack Obama as President of the USA has brought to the fore his adopted home-town, Chicago, now reinforced by his selection of Chicago-based Congressman Rahm Emanuel as his White House Chief-of-Staff.   Chicago, hog-butcher to the world, was known first in the 19th-century for its dominance of the meat industry, and then its dominance of the markets for other agricultural commodities.  In the 20th century this led to dominance of the financial markets where such commodities, and later more sophisticated financial products, were traded.  With all this money, it is not surprising that the world’s first modern skyscrapers were built there too.
But Chicago has also been a centre for business consulting – for example, via Arthur Anderson (founded Chicago, 1913), and its spin-off Anderson Consulting (now Accenture) – and a centre for marketing research and marketing data analysis.   That particular thread includes AC Nielsen (founded Chicago, 1923) and Information Resources, Inc. (IRI, founded Chicago, 1977).   The three founders of IRI, John Malec, Gerald Eskin and William Walter, sought to take advantage of newly-deployed supermarket scanners to analyse tactical marketing data for fmcg products.   (Modern supermarket scanners began operations in the US from June 1974.)
But there is an earlier fibre to this thread:  Before the invention of the electronic computer, Chicago was also a centre of manufacturing of adding machines.  Data, and its analysis – practical, no-nonsense, mid-western, even – has been a key Chicago strength.
Peggy A. Kidwell [2001]: Yours for Improvement – The adding machines of Chicago, 1884-1930IEEE Annals of the History of Computing, 23 (3):  3 – 21.  July 2001.

A data architecture for spimes

Thinking some more about spimes, those product entities that exist individually in space and time. I can see they could lead to major changes in the way in which marketing data is collected, collated, stored, analyzed, and used.   Clearly, individual spimes and their wranglers will generate a lot of data as they interact with the world and report back (eg, via RFID and GPS), and that data could usefully form the basis for marketing knowledge and marketing action.   But the web changes everything.  Spime wranglers, being intelligent human beings and companies, could comment and reflect on their interactions; the social web allows them to meet each other, across space and across time, in the same way that a houseowner can “meet” the previous or future occupants of his house.    Likewise, intelligent spimes could also reflect on their interactions, and even wrangle less-intelligent spimes.
What software architecture is appropriate for this mass of data?   Clearly, we’d want to store all the data, regardless of its format, in databases.  My question is pitched at a higher level of abstraction than that of the databases.  We desire that multiple, independent agents (both people and devices) are able to access the data, to read it and contribute to it, and maybe to over-write it (assuming they have the appropriate authorizations).  Moreover, we want to be able to combine and reason-across the data generated by one spime, say a particular motor vehicle, with that of other spimes — say, other vehicles of the same model, or other vehicles owned by the same person, or other vehicles purchased in the same year, etc.   We’d also like to combine and reason-across the data generated by spimes in different product categories — all the durables purchased by the Smith family in their life, for instance, or all the products purchased in Main Street, Anytown, last week.
An obvious data architecture for multiple, independent reading- and writing-entities is a blackboard.  A blackboard architecture is a shared memory space which enables agents sending and receiving messages to be decoupled from one another, both spatially and temporally.   Exactly as a blackboard does, messages left on the blackboard are stored until they are erased, and so the long-dead can communicate to the living, who can in turn communicate to the not-yet-born.   Tuple spaces and the associated Linda language are an example of a blackboard architecture (implemented in Java as Java Spaces).  We could imagine that each spime has its own tuple space, partitioned into secure sub-spaces for different spime-wranglers, from manufacturers, through each spime owner or carer, to after-sales service providers and disposal agencies.  Access to spaces will need to be controlled, so that only authorized agents may write, read and erase data in their allocated partition.   Here we could use something called Law-Governed Linda, an enhancement of Linda designed to add security features, although this may be too rigid for products whose uses cannot be readily predicted in advance.   An architecture allowing access to a tuple space following an appropriate dialogue between the relevant agents may be more flexible.
So far, so good for the data storage and access.  But spimes and spime wranglers will generate enormous quantities of data, and analyzing all this data will require some effort.  Better then, to plan for this effort and automate as much of the data collation, aggregation, processing and analysis.   Here, I suggest we should use so-called Tuple Centres, which are intelligent Tuple Spaces, able to reason over the data they hold.  Because we will want to combine and analyze data arising from different spimes, these tuple centres will need to communicate with one another, and agree (or not) to allow their data to be aggregated. A multi-agent system (MAS) with agents representing each spime-space (ie, the tuple space of each spime), and, for many spimes, each partition of each spime-space, seems the most effective architecture.  This is because the interests of the relevant stakeholders (spime-wranglers, marketing departments, manufacturers and service providers, data protection agencies, the state, the law) will vary and a MAS is the most effective way to formally represent and accommodate these diverse interests in a software system.
There are many details still be worked for this architecture.  But even at this level, it is clear that the traditional marketing data warehouse architecture is not sophisticated enough for what is needed for spimes. Hence, my statement above that spimes could lead to major changes in the way in which marketing data is collected, collated, stored and analyzed.  Use of spime data I will leave for another post.
TuCSon, developed at the University of Bologna, Italy, is a platform which enables fast implementation of tuple centre applications.