Roger Clarke's Web-Site

© Xamax Consultancy Pty Ltd,  1995-2024
Photo of Roger Clarke

Roger Clarke's 'Big Data'

Big Data's Big Unintended Consequences

Review Draft of 13 January 2013

SUPERSEDED - SEE THE PUBLISHED VERSION

Marcus R. Wigan & Roger Clarke **

© Xamax Consultancy Pty Ltd, 2013

Available under an AEShareNet Free
for Education licence or a Creative Commons 'Some
Rights Reserved' licence.

This document is at http://www.rogerclarke.com/DV/BigData-1301.html


Abstract

The concept of Big Data is not new, and neither are its consequences. What has changed during the last quarter-century, however, is the diversity of sources of data about people, and the intensity of the data trails that are generated by their behaviour. Exploitation of Big Data by business and government is being undertaken without regard for issues of legality, data quality, disparate data meanings and process quality. This results in poor decisions, the risks of which are to a large extent borne not by the organisations that make them but by the individuals who are affected by them. The threats harboured by Big Data extend far beyond the individual, however, into social, economic and political realms. New balances must be found.


Contents


1. Introduction

Big Data has been coming for years.

A quarter-century ago, dataveillance was identified as a far more economic way to monitor people than physical and electronic surveillance (Clarke 1988). The techniques of the early years, such as front-end verification and data-matching, were soon extended. Profiling involves a set of characteristics of a particular category of person being inferred from existing data-holdings, with other individuals who have a close fit to that set of characteristics being singled out for attention (Clarke 1993).

Following the development and application of neural networks and other rule generation tools (Wigan 1986), a larger scale process emerged. The eternal search for a new term to excite customers and achieve sales led to the notion of `Data Mining'. This framed the data as raw material, and the process as the exploitation of that resource to extract relationships that have been hidden because they are subtle, complex or multi-dimensional (Furnas 2012).

The promotional term in use during the current decade - `Big Data' - has been evident in the formal literature since the 1990s. The term's original use appears to have been in the physical sciences, where economics has dictated that computational analysis and experimentation complement and even supplant costly, messy physical laboratories. The techniques have found application in other disciplines, and given rise to the field of computational social science.

More recently, the term has been grapsed as a mantra by government agencies, with the expectation of attacking waste and fraud, and by law enforcement and national security agencies promising yet more and yet earlier detection of terrorists. Corporations, meanwhile, see Big Data as a prospective tool for commercial advantage, most critically in consumer marketing (Ratner 2003). For an indicative `spruik', see McKinsey (2011). Much of the populist management literature is expressed in vague terms, but some deal with specific cases. For reviews, see Craig & Ludlof (2011), Dumbill (2012) and Osborne (2012).

This paper commences with a summary of some key aspects of the Big Data movement. It then reviews a number of specific contexts in which Big Data is being exploited, in order to identify unintended consequences of the activity. Its focus is not on data about physical phenomena, but rather on data that relates to individuals who are identifiable, or to categories of individuals.


2. The Political Economy of Big Data

Some important and commonly-overlooked presumptions underlie the wave of Big Data enthusiasm. This section considers in turn the factors of legality, data quality, data meaning and process quality.

In some cases, a Big Data collection may arise from a single coherent and consistent data collection process. In others, however, quantities of data are acquired from multiple sources, and combined. The legality of each of the collection activity, the use for analysis, the disclosure, the consolidation, and the mining of the consolidated database, may be resolved, or asserted, or merely assumed.

The quality of the original data varies, with accuracy and timeliness problems inherent. Where data is re-purposed and disclosed or expropriated, the widely varying quality levels of data in the individual databases result in yet lower quality levels in the overall collection.

The meaning of each data-item in each database is frequently far from clear. Nonetheless, data-items from different databases that have apparent similarities are implicitly assumed to be sufficiently compatible that equivalence can be imputed.

Legality, data quality and semantic coherence appear to be of little concern to those responsible for national security applications. The risk of unjustified but potentially serious impacts on individuals is assumed to be of no consequence in comparison with the (claimed) potential to avert (what are asserted to be) sufficiently probable major calamities. The same justifications do not apply to social control applications in areas such as tax and welfare fraud, nor to commercial uses of large scale data-assemblies, but the grey edges between national security intelligence and other applications have been exploited in order to achieve a default presumption that ends justify means.

With legal, data quality and semantic issues resolved, or assumed away, a wide array of algorithms is available and more can be readily invented, in order to draw inferences from the amassed data. In scientific fields, those inferences are commonly generalisations. In managerial applications on the other hand, analysis of Big Data is used to a great extent not for generalisation but for particularisation. Payback is achieved through the discovery of individuals of interest, and the customisation of activities targeted at specific individuals or categories of individuals.

When generalising, there may be statistical justification for ignoring or assuming away data quality issues, and perhaps even incompatibilities between data-items acquired from different sources, at different times, for different purposes. When particularising, on the other hand, overlooking these issues is not justifiable, but rather cavalier. It undermines the quality of decision-making. In many circumstances, the risks arising from low-quality decision-making are borne by the individual affected by it, rather than by the organisation that makes the error.

The inferential techniques applied to Big Data are commonly `inductive' and `fuzzy'. Put another way, many inferences cannot be explained, and many inferences are not logically justifiable. The inferences therefore need to be empirically tested before being relied upon. On the other hand, testing costs money, incurs delays, punctures hopes and business models, and in any case lacks the emotive appeal of the magical distillation of information from data. So the truth-value of the inferences tends to be assumed, rather than demonstrated, and the outcomes are judged against criteria dreamt up by proponents of the technology or of its application. Analytical integrity is regarded as being of little or no significance. The appearance of success is sufficient to justify the use of data mining, whether or not the outcomes can be demonstrated to be effective against an appropriate external measuring stick.

A vast amount of data is generated by applications of Big Data techniques in the physical sciences - such as the SETI undertaking, genome projects, and analysis of data generated by CERN's Large Hadron Collider - and to some extent in the social sciences as well. These give rise to concerns such as the applicability of the analytical techniques used, quality controls over data management and the analysis process, and the lack of external standards for evaluating results.

Where the data does, or may, relate to individuals who are, or may become, identifiable, all of the same concerns arise, but additional factors come into play as well. The following section outlines some specific categories of Big Data, in order to provide an empirical base from which consequences can be identified.


3. Big Data Contexts

As noted in the Introduction, Big Data is not new. This section reviews some longstanding instances, and then moves on to more recent and still-emergent forms. In most cases, the example in itself involves a relatively coherent collection process. All, however, are amenable to integration with other sources to generate consolidated collections.

3.1 Consolidation of Government Data Holdings

Clarke (1988) identified a number of facilitative mechanisms for dataveillance. Important among them were the consolidation of databases, and the merger of organisations. The scale of data involved proved challenging, but smaller countries such as Denmark, Finland and Malaysia have achieved considerable concentration, supported by the imposition of a comprehensive national identification scheme.

In Australia, all of the c. 100 social welfare programs have been funnelled through a single operator, Centrelink, since 1997. In 2011, that agency was merged with the operator of the national health insurance and pharmaceutical benefits schemes into a Department of Human Services (with the ominous initials DHS). Australian health databases are being consolidated within the Department of Health, but utilising an identifier managed by DHS. Agencies in Australia have thereby made a complete mockery of data protection laws, overridden the public's strongly-expressed opposition to a national identification scheme, and enabled cross-agency data warehousing and mining.

In various countries, interactions with government have been increasingly consolidated onto a single identifier, denying the legality of multiple identities, and destroying the protection that data silos and identity silos once provided (Clarke 1994a, 1994b, Wigan 2010). The endeavour to impose singular, undeniable identity is increasingly being undertaken as a public-private partnership, with a (failed) Microsoft project being followed by 'real name' policies by Google and Facebook, and recurrent attempts by governments to outsource national eIdentity Management to supranational corporations.

The merger of multiple identities and of the associated data collections, with their highly variable quality and meaning, lowers the quality of decision-making. That creates considerable risks, which are frequently borne by the individuals rather than the organisations they deal with.

3.2 Loyalty Cards

In parallel with the concentration of data held about individuals by governments, consumer marketing corporations have attracted high levels of use of loyalty cards, enabling them to gain access to data trails generated at points of sale far beyond their own cash registers and web-commerce sites.

From this has developed customer relationship management (CRM), seen by some as the most significant initial Big Data application in this area (Ngai et al. 2009). Data derived from these sources can be combined with that from the micro-monitoring of the movements and actions of individual shoppers on retailers' premises and web-sites. Building on that data, consumer behaviour can be manipulated not only through targeted and timed advertising and promotions, but also by means of dynamic pricing - where the price offered is not necessarily to the advantage of the buyer.

3.3 Consumer Profile Databases

Consumer profiling companies have long gathered data, predominantly by surreptitious means, and in many cases in ways that breach public expectations and even the laws of countries that have strong data protection statutes, which includes most European countries. The US Federal Trade Commission (FTC) announced at the end of 2012 that it was finally going to investigate the operations of the shadowy nine so-called 'data brokers': Acxiom, CoreLogic, Datalogix, EBureau, ID Analytics, Intelius, Peekyou, Rapleaf and Recorded Future.

3.4 Social Media

Since about 2000, and much more emphatically since about 2005, consumers have been volunteering huge amounts of personal data to corporations that operate various forms of social media services. Google has amassed vast quantities of data about users of its search facilities, and progressively of other services, and about both users and others through its acquisition, retention and exploitation of all Gmail traffic. Since about 2004, users of social networking services and other social media have gifted to a range of corporations, but most substantially Facebook, a huge amount of content that is variously factual, inaccurate, malicious, salacious and sheer fantasy. Some is about themselves, and some is about their colleagues, their friends and many others who they come into contact with.

Google's revenue-stream is utterly dependent on the skill with which it has applied Big Data techniques to target advertisements and thereby to both divert advertising spend to the Web and achieve the dominant share in that market. In the case of Facebook, the corporation's initial market valuation was based on the assumption that it could gain similarly spectacular revenues.

As any new market structure matures, consolidation occurs. The decades of work conducted by consumer profiling corporations, out of sight and in the background, has been complemented by the transaction-based content, trails and social networks generated by social media corporations. Mergers of old and new databases are inevitable - and in the USA there are few legal constraints on corporate exploitation of and tafficking in personal data. This appears likely to be achieved by the cash-rich Internet companies taking over key profiling companies, e.g. Acxiom is a natural target for Google.

Analysts have documented various examples of new kinds of inferences that can be drawn from this vast volume of data, along the lines of 'your social media service knows you're pregnant before your father does'. These draw on established 'predictive analytics' developed in loyalty contexts (Duhigg 2012), but become much more threatening when they move beyond a specific consumer-supplier relationship. To marketers, this is a treasure-trove. To individuals, it's a morass of hidden knowledge whose exposure will have serious, negative consequences, and represent an invitation to speculation, innuendo and false matches.

3.5 Sensor Data

The data flows generated by sensors of various kinds are rapidly becoming an avalanche. RFID (Radio Frequency Identification) tags are already widespread, and have extended beyond the industry value-chain, not only in packaging, but also in the consumer items themselves, notably clothing. They have also been applied to public transport ticketing, and to road-toll payment mechanisms. The use of RFID tags in books was a particularly chilling development, representing as it does a means of extending far beyond mere consumption behaviour towards social and political choices, and attitudes and thought.

RFID product-tags are not inherently associated with an individual, but can become so in a variety of ways. Not least, a sufficiently rich trail associated with a commonly-carried item, such as a purse or wallet, is sufficient that a name-and-address or company id-code is superfluous. Worse, many of the applications of RFID in transport have had identification of the user designed-in, in some cases by requiring the person's identity as a condition of discounted purchase, and in others by permitting payment only by inherently identified means such as credit-cards and debit-cards. The 'intelligent transport' movement has also given rise to the monitoring of cars, which generate intense trails, are closely associated with individuals, and are available to a variety of organisations.

Some forms of visual surveillance also give rise to data that is directly or indirectly, but reasonably reliably, associated with one or more individuals. One of the elements of 'intelligent transport' is crash cameras in cars, which may be imposed as a condition of purchase or hire. Like so many other data trails, the data may be used for purposes additional to its nominal purpose (accident investigation), and with or without informed, freely-given and granular consent. Automated Number Plate Recognition (ANPR) has been expanded far beyond its nominal purpose of traffic management, to provide, in the UK but gradually some other countries as well, vast mass surveillance databases.

Devices that use cellular and Wifi networks are locatable not merely within a cell, but within a small area within the cell, and by a variety of means. Disclosure by the device of its cell-location is intrinsic to network operation; but networks have been designed to deliver much more precise positional data, extraneous to that purpose and intended to 'value-add' - in some cases for the individual, but in all cases for other parties. Devices and apps, meanwhile, have been designed to be promiscuous with their location data, mostly without an effective consent. Smartphones, tablets and other mobile devices are accordingly capable of being not merely located with considerable precision - with or without the user's knowledge and meaningful consent - but also accurately tracked, in real time (Michael & Clarke 2013). This has implications not only for each individual's ability to exercise self-determination, but also for their physical safety.

In less than a decade, almost the entire population in many countries has been recruited as unpaid, high-volume suppliers of highly-detailed data about their locations and activities. This data is of a highly personal and intrusive level even before being combined with loyalty card data and marketers' many longstanding, surreptitious sources of consumer data.

3.6 Smart Meters

In many respects, the much-vaunted 'Internet of Things' is no better than emergent, and perhaps just a promise, or even wishful thinking. On the other hand, some elements have arrived, and the monitoring of energy consumption is one of them.

Smart meter data is commonly transmitted via a wireless network. Further, the data, although nominally for consumers, is essentially about consumers and for energy providers. In accordance with the 'warm frog' principle, monitoring has been initially only infrequent, and the capacity of the provider to take action based on the data has been constrained. Highly intensive monitoring, and direct intervention by the provider into power-supply to the home and even to individual devices, are, on the other hand, intrinsic to the conception of the technology and to many current designs.

3.7 Aerial Surveillance

Satellite imagery has delivered vast volumes of raw material for Big Data operators. At higher resolutions, an amount of personal data is disclosed. A commonly-cited example is the discovery by local government agencies of unregistered backyard swimming pools.

Aerial surveillance from lower altitudes has been sufficiently expensive that its use has been restricted to activities with high economic value or a military purpose. A dramatic change in the cost-profile has occurred since about 2000, with the democratisation of unmanned aerial vehicles (UAVs). Drones have migrated beyond military contexts. Carrying 1080p video, and controlled by smart phones, they are now inexpensive enough to be deployed by individuals for unobtrusive data collection.

Aircraft licensing and movement regulators have not yet resolved important operational aspects of drones, but appear not to be interfering in their use in the meantime. Parliaments and regulatory agencies almost everywhere have failed their responsibility to impose reasonable privacy constraints on longstanding, fixed Closed-Circuit TV and now Open-Circuit TV. As a result, these new, mobile CCTV and OCTV cameras are operating largely free of regulation.


4. Data 'Ownership', Data Control, Data Rights

It is common for analyses of Big Data economics to refer to a notion of 'data ownership'. However, data is not real estate, and is not a chattel. Under very specific circumstances, data may be subject to one or more of the various, and very particular forms of so-called 'intellectual property', which have been created to enable corporations to not merely recover costs but to make very substantial profits by exercising their monopoly powers and restricting the activities of their competitors. However, many of the kinds of data that are the primary focus of this paper do not give rise to such rights. There are specific contexts in which an ownership concept may be relevant, but as a general analytical tool current notions of property in data are of little value (Wigan 1992, 2012).

In the personal data arena, the more commonly-used and more effective notions are data possession, and more importantly data control. These lead to a recognition that there are frequently multiple parties that have an interest in data, and there may be multiple parties that have some form of right in relation to it.

Aggregators of Big Data commonly perceive themselves to have rights in relation to the data, or at least in relation to the data collection as a whole. They claim the right to analyse it, and to exploit results arising from their analyses. They may claim the right to disclose parts of the data, to share or rent access to it, or to sell copies of some or all of it. Other organisations may claim rights that conflict with those of the aggregators.

Where data directly or indirectly identifies individuals, each individual claims rights in relation to it. Moreover, those claims are directly supported by human rights instruments, which in many countries have statutory or constitutional form. It is a poor reflection on the rule of law in these countries when highly uncertain claims of rights by government agencies and corporations are prioritised for protection over the much clearer claims of individuals.

Tensions between interests in personal data have always existed. A useful test-case is the public health interest in, for example, reports of highly contagious diseases like bubonic plague, which few people contest as being sufficient to outweigh the individual's interest in suppression of the data. The public health interest has been generalised a long way beyond contagious diseases, however. Cancer registries have been established, on the partly reasonable and partly spurious basis that rich data-sets are essential to research into cancers. The same justifications are being used to override the interests of individuals in their genetic data - with little public debate and little in the way of mitigating measures.

Big Data proponents are keen to achieve the same kind of undebated and almost entirely unhampered ability to develop warehouses of the many kinds of personal data discussed earlier in this paper. In doing so, they are implicitly wrenching western civilisation back from the many-centuries-old dominance of individualism to a time when a sense of collectivism was fostered as a convenient means of achieving hegemony over an uneducated and largely powerless population. A philosophy that is associated with the feudal era in Europe has survived in East Asia, is undergoing a revival through the invocation of ideas in good standing such as a 'private data commons', and provides a convenient means for Big Data proponents to justify the overturn of individual rights in relation to personal data and the destruction of privacy.


5. Consequences

Generic concerns about dataveillance were drawn to attention many years ago, and all are very much in play with Big Data. For example, decision-making comes to be based on data that is of low comprehensibility and quality but is treated as though it were authoritative. Other concerns include unclear accusations, unknown accusers, inversion of the onus of proof and hence denial of due process.

The intended consequence may have been improved efficiencies in social control and in marketing; but multiple unintended consequences arise. An individuals can be unjustifiably targeted by a social control agency because they fit an obscure model of infraction, even though the agency has very little understanding of the reason why the individual has been singled out, and hides behind vague security justifications to deny the individual access to their automated accuser (Oboler et al. 2012). These new forms of accusation suggest that Franz Kafka was insufficiently imaginative. Other unintended consequences include consumer behaviour manipulation through the unconsented (and unverified) inference of their interests, and the denial of consumer choice through the inference-based narrowcasting of marketing information.

The marketing message and mythology for Big Data, on the other hand, stress the extraction of new generalities that are of social and economic value. In the commercial arena, the archetypal (but apparently apocryphal) example is the discovery of hitherto unknown market segments, such as men driving home from work and stopping at the supermarket to buy 'diapers and beer'. Each sub-market for Big Data services has spawned its own pseudo-examples of the brave new world that the techniques are alleged to lead to.

The strong focus on the targeting of individuals brings into focus the question of identifiability. In many cases, data-sets retain identifiers for individuals (such as name and birth date, or a unique code issued by a government agency or a corporation). Even in circumstances in which no formal identifier exists, the richness of the data-collection is such that a reliable inference of identity can be drawn, and hence the data is re-identifiable. Generally, claims made about Big Data anonymity are best regarded as being at least highly contestable, and perhaps simply as spurious.

Some forms of data mining draw valid inferences, but ones that have been intentionally and constructively obscured, and whose exposure creates risks for individuals. This is particularly common in health contexts (Osborne 2012). At a broader level, Clarke (1988) referred to ex-ante discrimination and guilt prediction, and a prevailing climate of suspicion. Recently, authors have shown how Big Data creates the potential for many new forms of unfair discrimination, some financial, some social (Boyd & Crawford 2011, Croll 2012).


6. Conclusions

A half-century ago, data was scarce and expensive, and detailed and careful verification and analysis was performed on it. In the contemporary situation, data is in vast supply, automated processing is essential, and human understanding of the results of that processing is limited.

Big Data involves the exploitation of data originally acquired for another purpose. It commonly involves the contrivance of spurious consent, or unauthorised disclosure, or in some cases the pretence that the data has been anonymised, or a claim that the data was `public' or `publicly available' and, by inference, that the data has been washed free of all legal constraints and can be used however the exploiter wishes. Many aspects are in breach of data protection laws, but these can be readily avoided by corporations through the use of data havens, in particular the USA.

Applying the 'data mining' metaphor, the exploitation of resources normally involves royalty payments to whoever holds the rights in the resource. Yet data miners have been conducting their exploitative activities without any such imposts, denying individuals a return on their asset, their personal data.

Corporations and government agencies, which are in possession of the data, and which are not subject to meaningful controls by regulators or the courts, are in a strong position to protect their interests, whether they have formal rights or not. Individuals are excluded from the process, lack power, and have rights that are not protected by enforcement mechanisms. A new reconciliation is needed between the interests of the parties involved.

The problems arise not only at the level of the rights of individuals. The governance of democracies is directly affected. At one level, micro-targeting tools deployed in recent elections have demonstrated the risk that governments will sustain power through manipulative means more subtle than mere, old-fashioned, populist demagoguery. At another level, the transparency of individual behaviour to powerful employers, suppliers and social control agencies results in a chilling not only of criminal and anti-social behaviour, but also of artistically creative behaviour, and economically and technologically innovative activities. Western nations, through the Big Data epidemic, are risking stasis as grinding as that experienced in post-War East Germany.

As the volumes of data grow, and the Internet of Things takes hold, universal surveillance graduates from a paranoid delusion to a practicable proposition. The survival of free societies depends on individuals' rights in relation to data being asserted, and the interests of Big Data proponents being subjected to tight controls.


References

Bollier D. (2010) `The Promise and Peril of Big Data' The Aspen Institute, 2010, at http://www.aspeninstitute.org/sites/default/files/content/docs/pubs/The_Promise_and_Peril_of_Big_Data.pdf

boyd D. & Crawford K. (2011) `Six Provocations for Big Data' Proc. Symposium on the Dynamics of the Internet and Society, September 2011, at http://ssrn.com/abstract=1926431

Clarke R. (1988) 'Information Technology and Dataveillance' Comm. ACM 31,5 (May 1988) Re-published in C. Dunlop and R. Kling (Eds.), 'Controversies in Computing', Academic Press, 1991, PrePrint at http://www.rogerclarke.com/DV/CACM88.html

Clarke R. (1993) 'Profiling: A Hidden Challenge to the Regulation of Dataveillance' Int'l J. L. & Inf. Sc. 4,2 (December 1993), PrePrint at http://www.rogerclarke.com/DV/PaperProfiling.html

Clarke R. (1994a) 'The Digital Persona and its Application to Data Surveillance' The Information Society 10,2 (June 1994), PrePrint at http://www.rogerclarke.com/DV/DigPersona.html

Clarke R. (1994b) 'Human Identification in Information Systems: Management Challenges and Public Policy Issues' Info. Technology & People 7,4 (December 1994), PrePrint at http://www.rogerclarke.com/DV/HumanID.html

Craig T. & Ludlof M.E. (2011) `Privacy and Big Data: The Players, Regulators, and Stakeholders' O'Reilly Media, 2011

Croll A. (2012) `Big data is our generation's civil rights issue, and we don't know it: What the data is must be linked to how it can be used' O'Reilly Radar, 2012

Duhigg C. (2012) 'How Companies Learn Your Secrets' The New York Times, February 16, 2012, at http://www.nytimes.com/2012/02/19/magazine/shopping-habits.html?pagewanted=1&_r=2&hp&

Dumbill E. (2012) `What is big data? An introduction to the big data landscape' O'Reilly Strata, 11 January 2012, at http://strata.oreilly.com/2012/01/what-is-big-data.html

Furnas A. (2012) `Everything You Wanted to Know About Data Mining but Were Afarid to Ask' The Atlantic, 3 April 2012, at http://www.theatlantic.com/technology/archive/2012/04/everything-you-wanted-to-know-about-data-mining-but-were-afraid-to-ask/255388/

McKinsey (2011) `Big data: The next frontier for innovation, competition and productivity' McKinsey Global Institute, May 2011, at http://www.mckinsey.com/Insights/MGI/Research/Technology_and_Innovation/Big_data_The_next_frontier_for_innovation

Michael K. & Clarke R. (2013) 'Location and Tracking of Mobile Devices: Überveillance Stalks the Streets' Forthcoming, Comp. L. & Security Rev. Jan-Feb 2013, PrePrint at http://www.rogerclarke.com/DV/LTMD.html

Ngai E.W.T., Xiu L. & Chau D.C.K. (2009) 'Application of data mining techniques in customer relationship management: A literature review and classification' Expert Systems with Applications, 36, 2 (2009) 2592-2602.

Oboler A., Welsh K. & Cruz L. (2012) `The danger of big data: Social media as computational social science' First Monday 17, 7 (2 July 2012), at http://firstmonday.org/htbin/cgiwrap/bin/ojs/index.php/fm/article/view/3993/3269

Osborne C. (2012) 'Big Data' or 'corporate spying'? ZDNet. 6 November 2012, at http://www.zdnet.com/big-data-or-corporate-spying-7000006983/

Ratner B. (2003) `Statistical Modeling and Analysis for Database Marketing: Effective Techniques for Mining Big Data' CRC Press, June 2003

Shing-Han L., Yen D.C., Lu W-H. & Chiang W.C. (2012) 'Identifying the signs of fraudulent accounts using data mining techniques' Computers in Human Behavior 28, 3 (2012) 1002-1013

Wigan M. R. (1986) 'Engineering tools for building knowledge based systems' Microcomputers in Engineering 1, 1 (1986) 52-68

Wigan M.R. (1992) 'Data Ownership' in Clarke R.A. & Cameron J. (Eds.) 'Managing the Organisational Implications of Information Technology, II' Elsevier / North Holland, Amsterdam, 1992

Wigan M. R. (2010) 'Owning identity - one or many - do we have a choice?' IEEE Technology and Society Magazine, 29, 2 (Summer) 7

Wigan M. R. (2012) 'Smart Meter Technology Tradeoffs' IEEE International Symposium on Technology and Society is Asia (ISTAS), 27-29 October 2012, Singapore (accessed at IEEE Xplore 11-1-13)


Author Affiliations

Marcus Wigan is Principal, Oxford Systematics, Melbourne. He is also an Emeritus Professor, Transport and Information Systems, Edinburgh Napier University; a Visiting Professor, Imperial College London; a Partner in GAMUT, Faculty of Architecture Building and Planning, The University of Melbourne; a Professorial Fellow, Melbourne Sustainable Society Institute, The University of Melbourne; and an Adjunct Professor, ICT Faculty, Swinburne University of Technology.

Roger Clarke is Principal of Xamax Consultancy Pty Ltd, Canberra. He is also a Visiting Professor in the Cyberspace Law & Policy Centre at the University of N.S.W., and a Visiting Professor in the Research School of Computer Science at the Australian National University.



xamaxsmall.gif missing
The content and infrastructure for these community service pages are provided by Roger Clarke through his consultancy company, Xamax.

From the site's beginnings in August 1994 until February 2009, the infrastructure was provided by the Australian National University. During that time, the site accumulated close to 30 million hits. It passed 65 million in early 2021.

Sponsored by the Gallery, Bunhybee Grasslands, the extended Clarke Family, Knights of the Spatchcock and their drummer
Xamax Consultancy Pty Ltd
ACN: 002 360 456
78 Sidaway St, Chapman ACT 2611 AUSTRALIA
Tel: +61 2 6288 6916

Created: 13 January 2013 - Last Amended: 13 January 2013 by Roger Clarke - Site Last Verified: 15 February 2009
This document is at www.rogerclarke.com/DV/BigData-1301.html
Mail to Webmaster   -    © Xamax Consultancy Pty Ltd, 1995-2022   -    Privacy Policy