In this chapter we will address the rather novel phenomenon of virtual data. We will start by introducing some concepts that are necessary for our considerations. By virtual data we mean all data that are generated in cyberspace and gathered for the purpose of scientific research. The term cyberspace describes the ‘room’ of social interaction and communication that is made possible by the Internet as the technical entity of networked computers and the World Wide Web as one service run on this infrastructure. A subclass of virtual data are mobile data, which are generated or edited on mobile phones, smart-phones and other mobile devices. We will conclude the chapter with a consideration of types of mobile data that have become increasingly relevant.
Due to the rapid adoption of personal computing and the Internet during the last two decades, qualitative research finds itself with some new challenges. In cyberspace we are finding complex ways of social interaction which demand new or changed methods of data collection and analysis. Along with those developments we also saw the establishment of new kinds of mobile communication. Mobile telephony between any two places on earth was only the beginning, because other services like SMS (Short Message Service) and several stages of a mobile Internet have led to other innovative applications and, again, different ways of communication, which we will examine in this chapter.
Looking back at the historical evolution of research in cyberspace, one might argue that the discussion during the first generation of Internet studies mainly relied on a variety of predictions and assumptions about the effects of the Internet, without much empirical effort being made (see Wellman and Guilia, 1999; Wellman, 2004). Systematic collection of data, and a discussion about methods and methodologies based upon it, only started during the middle and late 1990s, when different disciplines tried to transfer established research approaches to the World Wide Web. The second generation of Internet studies mainly consisted of documentation of phenomena, spaces and [Page 451]structures in cyberspace: ‘The second age was low-hanging fruit with analysts using standard social scientific methods – and some concepts – to document the nature of the internet’ (Wellman, 2004: 127). At the same time, the Net was still treated as an isolated phenomenon, an assumption that was also brought forward by the ‘citizens’ themselves, as in the popular declaration of cyberspace independence that starts out like this:
Governments of the Industrial World, you weary giants of flesh and steel, I come from Cyberspace, the new home of Mind. On behalf of the future, I ask you of the past to leave us alone. You are not welcome among us. You have no sovereignty where we gather. (Barlow, 1996)
For the current, third generation of Internet studies, it has become vital to work with different kinds of data collected from the Internet. Because of this, technical developments are of great importance to the researcher, as new technological possibilities may pose new challenges.
While for over a decade the Net was mainly represented in textual form, it started to change to a more graphical appearance with the emerging World Wide Web. Growing bandwidth and new technologies have dramatically transformed the Internet. Nowadays we come across a great variety of multimedia artefacts; written text is a secondary consideration when dealing with many online services like Flickr, YouTube, online radio and podcasts. We find an enormous set of audiovisual forms by which we can articulate ourselves, and, like anything in cyberspace, they tend to be linked to each other in complex configurations.
The Internet and computer-mediated communication (CMC) allow researchers to disconnect from local and temporal boundaries attached to face-to-face (F2F) communication and thus approach people who might not have been approachable prior to CMC. Mann and Stewart point out some examples like ‘mothers at home with small children, shift workers, people with agoraphobia, computer addicts, people with disabilities’ (2000: 17ff.).
Additionally a researcher could contact participants who did not want to discuss the subject matter during a face-to-face session, but who do agree to discuss the material via CMC (Turkle, 1995). CMC can also have a safety function for both researcher and informant when it comes to censored or politically sensitive information.
Challenges arise from the fact that online research requires specific skills, as researchers are facing different circumstances in cyberspace than they would in the actual world. Using the Internet as an alternative way to access the field or as a form of communication technology might alter the outcome, but it can also enrich the research process due to its optionality. Online research offers a wide variety of proven methods, such as the online interview, online focus groups, or even several ethnographic approaches whose roots are in classical research methods but whose characteristics are novel today (see Mann and Stewart, 2000; also see Kozinets et al., Chapter 18, this volume). Specific methods will not be discussed at this point, but we want to point out that any transformation of established research methods must take the technological context into account. Questioning an informant via email poses different challenges to the researcher than a classic F2F interview, as in, for example, the asynchronicity of email, which might cause a greater distance to exist between both actors. There are also linguistic challenges for qualitative research. For example, gestures that can be recognized during an F2F interview are not observable in text-based chat, so users often take advantage of special expressions like emoticons. When evaluating and analysing data in the field, the researcher needs to consider that, apart from common phrases and emoticons, subcultural phenomena can also have a huge impact on the way users express themselves in a specific context.
[Page 452]The ways in which users generate data depends not only on the cultural context, but also on technological circumstances. In cyberspace, in addition to being anonymous, users often have the option to act and participate under pseudonyms, influencing online self-representation by choosing a specific nickname or alias. Role-playing and its implications for identity construction have been under discussion ever since virtual communities started to emerge; the discussion still persists today (see Rheingold, 1993; Turkle, 1995).
Further implications arise from the given data as well as access to the field. The researcher finds him- or herself among a high variety of given and generated data sets and a valid entrance point. To provide a better understanding regarding the different types of virtual data, some characteristics will be identified and discussed in the following sections.
If we look explicitly at data that are being generated by users in cyberspace, we typically can differentiate between static data and dynamic data. By static data we understand the kinds of data that (1) are not created by different users interacting with each other and (2) remain basically unchanged while they are continuously accessible. In this sense, many classic homepages can be considered collections of static data, since they tend to be available for long periods of time but are not usually altered through user interaction. By comparison we have dynamic data in situations of interaction, which means that they react to data generated by other users, as in a thread in a bulletin board discussion. If a user were to start a thread without referring to another discussion, and if the thread were not picked up by other users, it would not be considered dynamic. While the threads in bulletin boards may be relatively persistent, this persistence has vanished among today's social networking sites because of their continuous data streams; that persistence has been replaced with volatility. When considering data quality, this differentiation is important, because it may cause different challenges to arise during the research process.
The difference can partly be traced back to the technical evolution of the Internet. Especially in the beginning of the digital age, it took greater effort to put data on the Internet than to retrieve data. To publish a website, a person needed webserver software and separate access to another computer (e.g. via file transfer protocol or ‘FTP’) connected to the Internet. The mere creation of a webpage required either knowledge of HTML or special software that took care of it. Even when the Internet was publicly available, only an elite were actually able to do that. It is for this reason that the World Wide Web during its first decade shaped an asymmetric relationship between the consumers and the producers of content, a relationship which could only be resolved very slowly. Even with free services like Geocities that took care of most of the technical effort and could be considered a mass phenomenon during their time, there still remained thresholds, like the ones we mentioned earlier, which a person had to overcome for active participation. To create dynamic content, one needed even further technical knowledge and production costs (e.g. connection to databases). During the last decade we have witnessed a shift to a so-called ‘social’ Web, which focuses even more on dynamic (i.e. interaction-generated) data, which is why we think it makes sense to talk about the two types of data, static and dynamic, in more detail.
The early, text-based Internet consisted primarily of static content. On the one hand this meant that the relevance of content lasted for a longer period of time; on the other hand, the content could be kept available for years or even an indefinite time span (high persistency). Some very central data collections were migrated to newer technologies and actively cared for over generations by different [Page 453]people. Examples can be found in early Usenet, BBS or FTP servers that still exist today. Even the first Web-based services followed this pattern, which is evident from a glance at some of the bulletin boards or private homepages that still exist all over the Web (see Döring, 2002). The common pattern seems to have been that new content was added but old content remained basically untouched. Even when data were created through volatile forms of communication, they could be transformed into archives and thus become static. Monitoring changes on static websites seemed like a tedious task, but feeds and other automatically generated data fragments are much easier to analyse and can easily be sorted chronologically.
For scientific research, static data come with a long-term validity period and make it possible to do analysis over large time spans. Most of the time, static data are already sorted systematically, for example in a chronological format for easy navigation. Sometimes this happens on the level of the actual service, as on a web space service provider, for example; it can also be done by services like the ‘Wayback Machine’ offered by the non-profit organization Internet Archive (see http://www.archive.org/web/web.php). Snapshots have been taken since 1996 from a large portion of freely accessible websites, so that a person may be able to recreate a certain website's content and structure from various times in its past. These systematic approaches can also be found in the realm of computer science, where information retrieval or web archiving (Brügger, 2011) tackles the questions of systematic data collection. For scientific research, this means that extensive databases have already been created and you might not need to collect data yourself, though you may still need to filter and sort the data according to your specific research question in order to make most use of them. Approaches like information retrieval may also be helpful for qualitative social research because tools for data collection may be better realized as interdisciplinary efforts.
If we look at the historical and technical evolution, we can appreciate that the Internet has always been subject to a constant dynamic, which we can find both inside the infrastructure and in the ways it has been used socially. The development towards a social web reinforces this notion and at the same time allows us certain types of data. By ‘dynamic data’ we therefore mean data that users generate in interactive contexts when they react with their own data to the data of other users. In the case of communicational data, this would be information that has been picked up and republished while being changed in the process, as with ‘retweets’ on Twitter. The change can also come from contextualization. This does not imply that dynamic data are a novelty or a recent phenomenon, but rather that through the Internet's structural changes, which we are able to reconstruct, earlier frameworks and ways of utilization have been altered.
An important aspect comes with the archivability of data. Most of the time, data might be archived using additional tools and resources even when the service itself does not include this functionality, but we may also find huge streams of data that are volatile and of short-term availability. This shows the dynamic aspect's influence on data. An example can be found in the development of mobile devices. While we are able to reconstruct a diachronic perspective of bulletin boards, newsgroups, or blogs by means of a search function or an archive, social networking sites are an enormous challenge in this regard because of the relative volatility of data. This volatility results from a high interactivity that creates massive streams of data – as with Twitter, in which one tweet will ‘chase’ another. Data streams are very individual, because their composition depends on how many and which other users a person is following. But volatility also depends on the service's design and its technological framework. Particularly with mobile data we find that the central artefact [Page 454]on which data fragments are based can be very fleeting. As one example, the service Foursquare offers the ability for spontaneous local networks to arise based upon a common location and people's remarks about that particular place. By checking in at the same time to the same events or location, individuals' social networks might be extended for a limited period of time. The options for interaction are well defined: you can choose a place, post a comment, add a picture or photograph, and you may also send your check-in to other services. The latter is key to the current generation of dynamic data as described earlier. Based on the limits of a service or an application, the parameters of participation are changed. And while thematic communities are not a novelty, we do face new challenges as we have to identify new structural features with those new services. Other examples could include virtual game worlds. Especially in games like MMORPGs (Massively Multiplayer Online Role-Playing Games) we can find highly complex and yet volatile processes of communication, with different motivational aspects (see Yee, 2007).
The first examples of virtual communities on the Internet were created in the 1980s, long before Tim Berners-Lee developed the World Wide Web at CERN (Berners-Lee, 2000) and the Net was opened to broad public and commercial interests. The term virtual community was coined by Howard Rheingold (1993), who wrote about the early online community known as The WELL. The term started to grow into a universal concept for online communities in cyberspace. Because an important part of research in cyberspace focuses on the subject of communities, we will use them as examples here. The first generation of Internet studies, as we described, primarily tried to grasp those online phenomena and to describe the structures. However, many efforts back then relied on, as Wellman puts it, ‘conjecture and anecdotal evidence’ rather than on the systematic collection of data (2004: 124).
Using two specific types of online community, we are going to explain established research methods. The classic virtual communities have been around for about two decades and plenty of research has already been done on them. During the last decade, however, social networking sites seem to have become the dominant type of community on the Web.
Qualitative research on community structures can be achieved through an ethnographic approach, which means having access to the type of community that is of interest. On the one hand, researchers can try to contact members of a community in real life (e.g. for further research, like classic offline interviews) and to collect data from them outside of cyberspace, but this would ignore the virtual environment and therefore would not deliver virtual data as defined in the beginning. On the other hand – and this will be our focus here – one can get data from media artefacts (e.g. layout and structures of websites or software platforms, collected communication and discussions, documentation, etc. – see Kozinets et al., Chapter 18, this volume). Apart from those structural elements, the researcher can use the means of communication available to interact with people directly.
To specify the different types of interesting data that could be found on community platforms, one would need a basic structure. Marotzki (2003) has developed a set of structural features of virtual communities, which were collected through a comparison of 40 online communities prior to Web 2.0 (Jörissen and Marotzki, 2009: 192ff.). Note that this is only one of several possible ways to structure online communities. The features consist of: (1) metaphor/infrastructure; (2) sociographic structures (system of rules); (3) communication structure; (4) information structure; (5) structure of self-presentation; (6) participatory [Page 455]structure; and (7) online–offline relations. This structure, which also demonstrates possible areas of research interest, helps us to get an impression of the types of data we could expect.
The first feature formerly was named leading metaphors, as many communities established real-world allegories that served as a template for the communities' structures and visual appearance, like the popular metaphor of the city (Dieberger, 1994). Those metaphors were no longer used after the end of the 1990s, which is why the term ‘infrastructure’ seems a better fit to describe the appearance and technical base of any given online community (Jörissen, 2007). It consists of the technical aspects, like the software that is used or a description of the functions of a platform. This also includes the community's layout and visual patterns, from which it may be possible to tell what the goals or general idea behind a community might be. If it is a website we are looking at, several basic forms could be identified (like wikis, blogs or bulletin boards, for example); outside of the Web, we usually have some kind of software to access the virtual space (as with Second Life or some online games), which may also follow certain standards in structure. The technical possibilities and functions of the infrastructure define the frame of reference for the following features.
The sociographic structures are considered to be a system of assigned social positions as well as special rights and duties within the community. This especially includes a set of rules, which may regulate access to the community and behaviour once a user is inside. Those sets may become quite complex and contain educational aspects, because they generally aim at rewarding desirable and sanctioning bad behaviour. There might be a sophisticated registration process, which could be analysed. We might be able to deduce the self-understanding of a community from these sorts of rules and processes.
The structure of communication describes all forms of communication that the members of a certain community may use, such as chats, boards or comment systems. From their technical parameters (e.g. asynchronicity/synchronicity or one-to-many/one-to-one/many-to-many) we get a rough framework, which is usually specified further through processes of social negotiation (e.g. rules). We can tap into this with methods of content or discourse analysis to get data from which we may be able to learn the effects of the forms of communication.
Like the previous feature, the information structure consists of all the aspects and ways through which information is presented in a community and to whom it is available. Some communities create large internal collections of links or texts, which grow continuously and can form very complex structures over time. Of course, apart from text, this can include pictures, sounds or videos, each of which requires different methods of analysis.
The structure of self-presentation asks about the ways in which individual members manage their identities. In communities, members usually encounter options for creating a profile or an identity card, and they submit information to it in an effort to present themselves to other members. The most basic example of this is choosing a nickname, but members may include additional information, pictures, or even a graphical avatar. Users decide whether or not to create a special online identity and how similar it is to their real-world identity. The Internet offers a way to reflect on oneself and to create a completely new self (see Boyd, 2007; Turkle, 1995; Marcus et al., 2006). Furthermore, users may have a private space online that they can share with a certain group of friends. Obviously, analysing all of this can be quite complex, and it may be complicated to reconstruct a whole identity. In the case of an analysis of graphical representations, for example, avatars might deliver interesting data (Jörissen and Marotzki, 2009).
The degree to which members might be able to participate in a community is subject [Page 456]to its participation structure. This is a central consideration when someone is offering the service, and so control may or may not be shared and decisions about the community might be made by one person or just a few people. There may also exist more democratic structures that offer rights of participation to any member; sometimes, a community's governing bodies are established through a process by which members choose privileged users to represent them. Those structures might be documented explicitly (e.g. in the rules or the terms of service), or they might be extracted from different roles that users might take in a community. Any discussion board, for example, might have a hierarchy of administrators, moderators, and registered and unregistered users, and each stratum may have different rights and functions to perform. If it is not otherwise available, we may get this kind of information through questionnaires, group discussions or interviews, depending on the research question. While certain roles might exist in different communities, their dedicated functions and qualitative character are usually bound to the specific community.
Depending on the aim of a community, there could be relations between online and offline activities of variable strength. There might be members' meetings, for example, or real-world events. Such things may be important if an online group was created from an already existing offline group and was designed to meet its specific needs. It is not uncommon for groups in communities to be created from the real-world proximity of their members. In the end there are always real people behind virtual communities, and their thinking and acting in the offline world may influence what they do online. In current research there seems to be an obvious and increasing trend to integrate online and offline aspects.
From these structural features, we can easily see that many kinds of virtual data could be gathered for any given research question. There might be a combination of more objective data, like written rules or technical structure (functional layout, visual style), with subjective data collected from informants. A triangulation of several methods of data collection and analysis will likely be necessary. Also, as mentioned previously, some kinds of data might not be collectable at all through certain established methods in cyberspace. There are special challenges inherited from the characteristics of CMC.
In many cases, access to the field for the researcher might be easy because the technical barriers are low (e.g. when there is only simple registration). So it might be quite easy to collect data from a given community. Yet researchers need to realize that they might operate in closed (non-public) or even private contexts. Thus they need to make sure to assess the character of a community prior to collecting and using data from there. In almost every case there are aspects of research ethics and law that may affect the researcher; we will elaborate on those at the end of the chapter.
With ongoing technical development and the evolution of the Internet, the options for creating virtual communities have changed as well. With the beginning of the so-called Web 2.0 we find more specialized and complex examples. Research has shown that communities online are no longer considered to be structures of only strong ties, but have grown to include weak ties and individual social networks. These changes are accompanied by a change in focus of many commercial services in cyberspace and might be considered a global mass phenomenon at the moment.
After several kinds of virtual community spread across the Internet, new forms started to emerge that followed a rather different approach to community structure. Social networking sites are one of the main innovative phenomena brought forth by the social web or Web 2.0, and they have evolved continually since the beginning of the millennium. Along with new tools for online participation and [Page 457]collaboration, a massive growth in user numbers, and a relocation of complex applications from the PC to the Web (or the cloud), new web platforms have been built. They differ from classic virtual communities in that they are designed with a low threshold for obtaining membership and deal primarily with weak ties between individuals. Following Boyd and Ellison, social networking sites can be described as:
web-based services that allow individuals to (1) construct a public or semi-public profile within a bounded system, (2) articulate a list of other users with whom they share a connection, and (3) view and traverse their list of connections and those made by others within the system. The nature and nomenclature of these connections may vary from site to site. (2007)
They emphasize that actual networking, which aims at establishing new connections with previous strangers, plays a minor role in those platforms and that the primary interest of users so far seems to be the representation and nurturing of existing contacts. Therefore communication is most likely to happen between people who already have a connection or who have already met in real life. Apart from this, the central idea consists of the creation and design of a personal profile, which connects to the identity management aspect in classic virtual communities. Despite these overlapping aspects, there are also new research methods available, which do provide researchers with different kinds of virtual data (see Kozinets et al., Chapter 18, this volume).
The first social networking site, SixDegrees, started in 1997, but the service was closed in 2000. Friendster was founded in 2002 and became the first social networking site to gain some popularity. During the last 10 years several large services have been developed, differing mainly because of a focus on certain technical features, on particular subjects, or on users from a shared geographical region. Twitter (http://twitter.com), for example, is a large micro-blogging platform through which you can exchange short messages of 140 characters or less with your social network (called your ‘followers’). Last.fm (http://last.fm) is a Europe-based social networking site that focuses on the musical preferences of its users. Facebook (http://facebook.com) started out as a networking tool for students at Harvard University but gradually transformed into a worldwide meta-network for everybody. These few examples already show that every distinctive service can have very specific features even though they all serve the function of a digital social network.
To look into these new phenomena, researchers again can only rely very limitedly on established forms of data and methods to gather them. Many of the structural features mentioned earlier can also be found on social networking sites, but such sites lack the clear distinction and seclusiveness of a classic virtual community as well as the commonly visible strong ties between all members. But perhaps more obviously now than ever before, we are finding opportunities to apply the qualitative method of social network analysis – first used systematically in the 1950s (Barnes, 1954) – to describe social ties between individuals and the structures emerging from them. The method represents a ‘shift from the individualism common in the social sciences toward a structural analysis’ (Jones, 1999: 78). The method is based on the notion that researchers can make assumptions about a certain group, community or organization and its meaning for both the entity and the individuals involved by looking into the social relations that constitute the network. In cyberspace it is of course limited to digital ways of communication. Without explaining the method itself in detail (see Blank et al., 2008, Garton et al., 1999; Gaiser and Schreiner, 2009), it delivers a new kind of data, namely a network of social relationships that has gained new relevance because of the rise of social networking sites. We can differentiate between ego-centric and whole or organizational networks. With the former, of course, one has the ability to look at the emerging influences that a social network might have on any specific individual. Where [Page 458]the borders of a network should be drawn depends on the specific context and the research question at hand so that we have a very flexible approach. Social networking sites work especially well for social network analysis, if we want to look at the dynamic exchange of information or aspects of mutual support. Prior to social networking sites the only way to reconstruct a personal social network involved using questionnaires and interviews (see Roulston, Chapter 20, this volume), letting people write journals, or observation (see Marvasti, Chapter 24, this volume). Social networking sites by design make individuals' social networks visible, and the quality and strength of ties can be assessed from acts of communication or from labels that are attached to them (e.g. friends, family or fans). Furthermore, it is possible to identify special roles or actors in a social context, which can become the focus of ongoing qualitative research. Where an analysis based on structural features requires that a specific structure is actually present, network analysis is more open. Modern technical platforms, which only enforce very weak structures and leave specific possible uses to the actual users, may be easier to grasp with this approach. In another step, researchers may then take a closer look at online self-representation, means of communication, or rules.
Thus it seems evident to us that the theoretical framework and the method of social network analysis is a very relevant approach to analysing social networking sites, as they offer new ways to gather and evaluate virtual data on social relations. Especially when looking at method triangulation, there are new options and new potential views on virtual communities. But there are also some new problems. Collecting data from a very volatile source like Twitter might be a challenge due to the potentially huge amounts of data. A specification (e.g. using keywords or hashtags) might not serve the research question as communication threads might not be captured, a problem that stems from the data's dynamic nature as explained earlier. An automated way to collect data would be necessary when dealing with large data streams, yet flexible, case-by-case adjustments would also be needed. Such an analysis of dialogues seems to be problematic from a linguistic perspective at this point (Zappavigna, 2011).
Based on the previous examples for research scenarios (virtual community research and social networking sites), virtual data can be classified into two groups. First, there are data which have already been generated and can be found and retrieved from the Internet. This includes, for example, archived conversations from newsgroups, online communities, or other kinds of Internet platforms such as online documents or friend lists. Second, a researcher or a researching group also can generate data by raising polls, conducting online interviews, or setting up a specific online service themselves, such as a wiki. Even if only textual data are the focus of epistemological interest, this may already result in a very complex set of interconnected data types and contextual shapes, which can be analysed within methods like discourse analysis, interview analysis or even a corpus analysis.
However, media artefacts such as images or videos can be evaluated separately with methods of image or film analysis, but connections between them always raise the questions of which data should be included, how to triangulate between different qualitative research methods, and how the whole sample should be structured. For example, when trying to get self-centred social networks from a video platform such as YouTube, additional information can be gathered by analysing the audiovisual articulation. This can be helpful for interpreting the relations between the users, and it goes far beyond looking for the nodes and relational positions inside a social network.
For research on social phenomena in an online group of players, as in a guild of MMORPGs, looking at the community and [Page 459]its structural elements as described earlier might be a key aspect. Therefore the researcher could systematically observe the ‘naturalistic’ behaviour of users (see Schroeder and Bailenson, 2008). Even an online ethnographic approach would need to consider using different media artefacts completely to understand subcultural dynamics and models of behaviour. Therefore it could be necessary to record in-game footage, while having players communicate and act together. It could also be very helpful to record their communication on a voice server. This could be realized, for example, with TeamSpeak, which is software that allows computer users to speak on a chat channel, just like a telephone conference call. This tool is designed for gamers and could provide additional help to researchers looking at gaming contexts, as they would use the same tools as the target group, thus perhaps increasing their social acceptability. Collecting online data such as in-game footage might require a huge amount of hard disk space, since the video material could be uncompressed. The kinds of data and evaluation method required by a researcher will greatly depend upon the individual epistemological interest.
Since data collected online are represented in digital form, they can in many cases be easily processed by a wide variety of specific software tools for qualitative research. Chapter 9 of A Guide to Conducting Online Research offers a general introduction to analysing online data (see Gaiser and Schreiner, 2009: 113ff.). There are several software applications, such as MAXQDA, NVivo or HyperRESEARCH (see Gibbs, Chapter 19, this volume), that are specifically designed for processing and coding text-based data with plenty of different tools. For reconstructing and visualizing social network relations, researchers can take advantage of tools such as InFlow. Several other tools can help with the basic tasks of data processing, but some challenges still persist. The problem of multimedia samples consisting of different media types will not be solved completely, because consideration of a coherent sample structure is required when interpreting the data. References to pictures, video or sound fragments inside a sample might not be easily included. Some methods will still require transcription or fragmentation of media artefacts to be integrated with other data. Thus working with multimodal data sets remains a sophisticated task despite the use of digital tools.
While for a long time the classic Internet was bound to stationary devices like the PC, we are now seeing the development of smaller, more mobile devices, particularly smart-phones, which is leading to a new expansion of cyberspace. Currently we are able to carry much of the functionality of the Internet in our pockets at all times. Mobile data, therefore, are a new kind of structured virtual data found in new contexts and scenarios.
On a very basic level, the differences for mobile data come either from technical limitations or from the more flexible usage contexts. Mobile online services are mostly designed for a very specific purpose, and the form and kind of data are strictly pre-structured most of the time. Early mobile services like SMS (text messaging) could only transfer messages of a certain length because of technical limitations. The need for specifically formatted messages then was taken up by services like Twitter, which could technically offer much longer messages but keep the format and the limitations that came with them for convenient mobile communication. Strict enforcement of the 140-character limit has directly influenced usage. For example, URL-shortening services are offered to reduce the size of links, and even the style of language is heavily influenced by the limit (Boyd et al., 2010). Furthermore, social practices like retweeting, which means to share messages from others with all of one's followers, have been established because of the convenient data packages. Long postings to [Page 460]blogs cannot be done easily on mobile devices and do not fit well with mass distribution. Therefore, while technical access to the ‘real’ Internet over mobile devices might be available, services that specifically cater for mobile devices and special mobile applications (or apps) are much more popular. They may also use Web-based interfaces, but they do not necessarily have to deliver the same content that would be accessible via the Web.
Not only are user interfaces redesigned for mobile use, but mobile services usually only implement a set of functions that are actually relevant in mobile contexts. Ways of communication may be very different from the home PC or classic web services. Additionally, we also have platforms that perform only core functionality, which are then built upon by third-party services. Only through services like Twitpic has it become possible to share photos easily through Twitter; services like foodspotting (http://foodspotting.com) or Blip.fm (http://blip.fm) offer a thematic frame and try to combine users into sub-networks based on their common interests.
We will now look at some more examples of specific forms of mobile data.
Since the late 1990s, research has been carried out, for example, on the meaning and usage of mobile phones by young people (Lenhart and Madden, 2007). Numerous studies have attempted to show that mobile phones have drastically changed young people's habits of communication, and also that mobile phones are a big part of the processes of socialization and identity creation (Ling, 2004; 2007). Mobile services like SMS text messaging or messaging through Twitter are used by young people to uphold their social networks and to present themselves within their peer groups.
To capture these user habits, researchers have mostly used questionnaires or interviews, an example of the increasingly common practice of combining quantitative and qualitative methods (see Purwandari et al., 2011). Direct access to a set of exchanged messages could be quite problematic: for obvious legal and ethical reasons, mobile service providers cannot provide access to private data, although they might do their own research on their own data to optimize the services they offer. Other service providers (e.g. on the social web) might offer dedicated interfaces to create data collections and thereby encourage analysis. With the commercialization of many of these services, however, those options are fading again. One needs programming skills to be able to use those techniques in the first place. If we again take aspects of research ethics into account, it seems necessary for the researcher to have direct contact with each and every user. Only then is it possible to ask for adequate permission and also to get access to individual user strategies. Ethnographic observation might also be tricky because it may be hard for the researcher to access the actual real-life context of a virtual communication act (research on mediated rituals could be one example; see Ling, 2008). Also, the problem of the researcher's presence, which may influence and possibly change the outcome of the research, probably needs to be re-evaluated.
Structural features, like those we have examined in this chapter in regard to online communities and social networks, are also present in the mobile realm, but they might follow different rules (such as limitations on pre-structured data, as mentioned earlier). The infrastructure of mobile apps and services might follow some mobile-specific patterns. Additional sources of information (as in GPS location data, or photos and movies via integrated cameras) might be central artefacts in a communication act and therefore key to the construction of meaning. It is also clear that mere textual analysis may no longer be satisfactory in those cases as well: picture analysis may be an additional tool, but it will create new kinds of data. The researcher needs to be able to handle the data, which means that clever ways of reflected triangulation are necessary. Many recent studies have accessed the field through quantitative methods, and there seems to be room for exploring qualitative research in mobile contexts. But at the moment there are some barriers that keep hindering such efforts.
[Page 461]It is not yet certain whether mobile data really are going to be a category in themselves along with virtual data. It is possible that the process of merging the virtual stationary and the virtual mobile spaces is just another step towards the ubiquity of cyberspace. This could imply a change for the Internet, which simply integrates all the mobile aspects that were just mentioned. As the evolution of the Internet showed in the past, we may see strong paradigm shifts within only one generation of users. The Internet has changed many times and will probably continue to do so. For researchers this means that it is imperative to look very closely at the usage context of virtual data and to take it into account. This is, of course, true for social research in general, but it becomes more central in cases like the ones we have just described.
Online research does not bring up completely new problems or challenges; rather, it changes several parameters of issues that were previously known. Even if online data are accessible, the researcher needs to consider data collection legislation. Furthermore, a researcher has to consider that even if data are publicly accessible, users may not necessarily be aware of the fact that their information can easily be taken away. Also, interaction involves several aspects that need to be clear. For example, an online evaluation mostly takes place in public spaces, so other users are likewise witnesses. Even if there is no personal contact, it is appropriate to communicate the research interest and to specify the purposes of data collection.
The collected data should be accessible to participants; they should be able to know what data have been collected about themselves.
A direct citation from a newsgroup or discussion board, for example, can easily be back-traced, thanks to the power of modern search engines, even if the author is anonymized. As a result, users can find out further information about the cited person. It should be possible for a person to be comfortable with being connected with the data, but individuals may want to remain anonymous. A lack of data protection could deeply damage the trust between researcher and participant: ‘Researchers need to consider the long-term implications of data protection issues at an early stage – they strike at the very heart of traditional qualitative research methodology’ (Mann and Stewart, 2000: 41). Also, researchers need to consider the authenticity of data; therefore it often seems necessary not only to categorize the types of data, but also to identify important and relevant actors and their roles. As in classical qualitative research processes, the researcher needs to get a feel for what is happening and how data are being generated in the specific contexts. In each case, the researcher has to determine whether it would be useful or even essential to raise more information online and perhaps beyond cyberspace.
Virtual data and all of the subtypes of data, like mobile data, are becoming increasingly recognized, and not just in qualitative research. Current research on cyberspace is determined by two parallel lines of development. On the one hand we find a consolidation of the many types of data and research methods we have mentioned in this chapter into a separate discipline of Internet or Web studies. On the other hand, cyberspace, due to its ubiquity and interconnection with everyday life, can no longer be seen as a separate field, ignoring its dependence on real-world contexts (Wellman and Haythornthwaite, 2002; Wellman, 2004). Consequently, virtual data become increasingly important even to research projects that do not primarily focus on the Internet itself. It follows from this integration that new interfaces have to be created and that established methods need to incorporate virtual data into the research process. Discovery as well as explication of [Page 462]these distinct types of data often require individual as well as interdisciplinary strategies and thus pose new challenges to many research disciplines.
The recent evolution has shown that mobile data are becoming more and more important and that cyberspace is changing to become even more complex. This already influences research practicalities, such as, for example, when relevant data need to be identified and selected for further research from the numerous services and types of data. The advancing degree of complexity is illustrated not only by an ever-growing number of web services, but also by their increasingly complex interlinking, which could lead to a highly individualized and selective usage. It is certainly possible that well-known and common schemes, as were observed in the evolution towards the social web, will again be seen here. For researchers, the challenges emerge from the varieties of data and their particular specifics concerning services and formats. Consequently, the researcher needs to align methodical and methodological considerations with the socio-technological circumstances in the field.