web studies

Bon Adriel Aseniero and Terrance Mok

overview

Human–Computer Interaction (HCI) informs the design of computing systems with an emphasis on human factors. Hence many of the research methodologies in HCI involve the study of human cognition and behaviour. For example, studies such as lab experiments, questionnaires, and deployment studies are used in HCI to evaluate the usability of computing systems. In this topic review, we will be presenting a form of deployment study.

Deployment study is a research method where a system is released into the wild for use of the people it was designed for. This essentially allows systems designers to evaluate their system in the realest settings possible, assessing for real-world experiences. Thus, deployment studies give us opportunities to understand how a system is used in context as opposed to hypothetical scenarios as in lab experiments. Deployment studies come in a variety of types: field deployment (workplace, home, and public settings), and web deployment. We will deal specifically with web deployment studies, examining its pros and cons, and its implications on research in general.

For the remainder of this topic review, we will be giving an introduction to web deployment studies: describe its key concepts in detail, present its history, outline its pros and cons, and provide an example exercise on designing and running a web deployment study.

Introduction

Web/Online studies

The internet allows researchers to reach out to large numbers of diverse participants which can be ideal for evaluating systems. More importantly, web studies are ideal for studying systems that are designed for web-use. Researchers who are studying their systems online can pick from a variety of web studies, ranging from simple online surveys to more complex evaluation techniques studying online behaviours. In essence, any evaluation that involves the internet is considered to be a web study—and as with any research evaluation, it must be well thought of and planned in order to get meaningful answers.

Web studies are flexible, allowing a researcher to design studies that are quantitative, qualitative, or a mix of both. This means researchers can study a system using top-down approaches: presenting online users with segmented tasks to perform while using the system. This is approach is good for more controlled experiments measuring interaction variables such as click accuracy, or time taken for task completion. Another approach is more open-ended, bottom-up approach, where users are free to use the deployed system in their own context, similar to in-the-wild types of field deployment studies. In this approach, researchers must rely on user activity tracking in order to piece together their usage experience. This can be combined with questionnaires after the study to explore correlations between different types of measurements.

Controlled web studies

According to Kohavi et al., researchers who wish to do a top-down approach, or controlled experiments online need to think about seven rules of thumb: (1) small changes can have a big impact to key metrics, (2) changes rarely have a big positive impact to key metrics, (3) mileage will vary, (4) speed matters, (5) reducing abandonment is hard, shifting clicks is easy, (6) avoid complex designs, (7) have enough users. (kohavi, et al., 2014)

Open-ended web studies

Researchers using a bottom-up approaches need to track their users’ every move. (Atterer, et al., 2006)

History

When looking at the field of human-computer interaction (HCI) we find that researchers have been influenced by experimental methods used in psychology. Early research on interaction methods included the use of laboratory studies.

Douglas Engelbert and his team in the 1960s at Stanford ran laboratory experiments to determine that the computer mouse was the best input device, especially with experienced subjects, when performing screen selection during text manipulation. The experiment ran in the laboratory with task explanations given to subjects before they had to perform text selection with the various input devices. Performance in speed and error rate were dependant variables measured through the experiment to determine the best devices.

While laboratory experiments offer great control to the researcher their conclusions may be difficult to apply to the real world. Researchers in an attempt to get more “natural” results make use of field experiments. A field experiment studies users in their normal environments with as little interference as possible other than some condition the researcher wishes to manipulate for their study.

Famous experiments at Hawthorne Works (a Western Electric factory) were performed to test productivity of their workers in differing levels of light. Productivity was measured as experimenters exposed the workers to low and high light environments. The study was run with real factory workers at the factory rather than in sterile laboratory environment. Performing observation of participants in natural settings gives a sense that the results can be applied in the real world.

People’s everyday life now consists of many online activities and interactions. To study people’s real world usage of internet technology researchers have been deploying their systems online. Web deployment strategies are a type of field experiment where a technology being studied is taken out of a laboratory setting and instead looked at in the real world online.

Strengths

One of the greatest advantages to deploying a technology to the web is gathering a large amount of users; studies can reach hundreds of thousands to millions of users when they can be accessed online. Additionally having a technology available online allows a study to have participants from anywhere in the world with internet access.

Studies placed on the internet also have the ability to track many factors of user interaction such as click rates, page impression count, and usage time.

Web-deployment is good at capturing real world usage of a technology. Having real users interact with your technology in real world scenarios gives a sense of a “natural” quality to your results. In a laboratory experiment we need to worry that our results might not be applicable outside the lab.

By deploying technology we gain access to data about social media interactions such as sharing, commenting and “liking”. For example a study could track how often a story on a social media site is shared when we change the font of the stories title. Studies like this which rely on real social behavior technology usage would be impossible to run without web deployment.

Challenges

Putting your technology on the web is no guarantee that people will 1) find it and 2) use it. The greatest advantage of deploying to the internet is the access to a vast amount of users but attracting those users can be difficult. Millions of web pages, banner ads and funny cat videos are competing with your online study. Entire companies are built on search engine optimization (SEO) to try and get their web page to the top ranks of search engines and you need people to find your deployed technology.

Web deployment taking place in the real world can provide data which can be hard to filter. Since we have no control over the users or their environments there can be many outside factors that impact the data. For example if we try to measure speed of performing a task on a web page we may get results back that include users that started the task, left to get a coffee, and then came back to complete the task.

Powerful user tracking through tools such as cookies, ip address and logins give a study access to user data. We must be careful in the approaches we take to track users for security and privacy reasons. For instance if our web technology requires a user to login with an email and password we need to make sure that proper security measures are put in place. Everything on the web has the potential to be hacked.

Upkeep. If your research presents a publically available web technology readers of your papers will expect to be able to use your system themselves. You need to keep domains registered and servers available for as long as you want readers of your paper to use your system. For example the PeopleGarden paper from MIT Media Laboratory gives a URL which currently gives a 404 error; this is potentially off putting to readers of the paper.

Possibly many uncompleted tasks. Users giving up part way through your set of tasks.

Different browsers/technology. Internet Explorer, Firefox, Chrome, Opera will render web pages differently. Within each of those browsers users may have plugins/extensions which can alter how your web technology runs.

Examples of web deployment studies

Facebook

Facebook (along with many other internet companies) often perform A/B testing with their products. They will alter their webpage exposing some set of users to condition A and another set of users to condition B. For example some set of users may shown a red logout button (condition A) while other users are shown a blue logout button (condition B). In an example such as this Facebook might be interested to see if having a red logout button causes people to logout from Facebook more often.

One infamous example of this type of testing involved Facebook manipulating the news feed of users exposing to them to more/less positive or negative newsfeed items. They then measured a user's status posts after seeing those newsfeed items to see if the user posted a more positive or negative status.

http://www.pnas.org/content/111/24/8788.full

Where to Go on Your Next Trip? Optimizing Travel Destinations Based on Use Preferences.

A study was performed with a real online application, www.booking.com, a real travel website. A/B testing was used to perform randomized control trials splitting users into either the baseline or new version of the website. Measured conversion rates, by counting clicked results, from real users in the system. Used G-test statistic (G-tests of goodness-of-fit) saying the new method was significant where G-test p-value was larger than 90%.

http://dl.acm.org/citation.cfm?id=2766462.2776777

Storytelling in Information Visualizations: Does it Engage Users to Explore Data?

Experiments were put online in a popular news and opinion outlet, Mediapart, and a popular visualization gallery website www.visualizaing.org. Looked at multiple visualizations.

Case 1 - CO2 Pollution Explorer: Mediapart page resulted in 2975 sessions and the visualization gallery received roughly 4000 unique connections. Compared standard Web analytics (i. e., total uptime and click-count). Results included normally distributed data and reported on confidence intervals for various measures (meaningful interactions, time spent in section, time on webpage)

http://dl.acm.org/citation.cfm?id=2766462.2776777

People Garden

Deployed to two groups of users to track how involved participants were in online groups. Unfortunately the URLS in the paper are no longer reachable showing that upkeep is a challenge as previously discussed.

http://dx.doi.org/10.1145/320719.322581

Your own web study

Looking to try out your own web study? Some of these references will give you a good rundown of important things to consider.

Designing and deploying online field experiments. http://doi.org/10.1145/2566486.2567967

Seven rules of thumb for web site experimenters. http://dl.acm.org/citation.cfm?id=2623330.2623341

Cookies: A deployment study and the testing implications. http://dl.acm.org/citation.cfm?id=1541822.1541824

Knowing the user’s every move: user activity tracking for website usability evaluation and implicit interaction. http://dx.doi.org/10.1145/1135777.1135811

This last paper in particular is a good resource on different types of user activity tracking we can do with standard web technology.

"Small and concrete": Trivial extraction of simple interactions with elements in the webpage such as time an input field was filled or time a button was clicked.
"Small and abstract": Information extrapolated based on a log of user action. For example we could deduce that a user did not read all the instructions on a page if they completed the corresponding form too quickly. or "The user seems to have had trouble deciding on an action, as he repeatedly hovered the mouse over the alternatives." [16]
"Large and concrete": More broad/general information about a user that spans multiple tests. For example: "The user is very precise when clicking on targets, but very slow when typing." [16]
"Large and abstract": General background information for a user such as their computer experience.

Additionally a common starting place for standard web based tracking of user activity (e.g., page visits, page bounces, link follows) is to use Google Analytics https://www.google.ca/analytics/. Google Analytics is targeted towards businesses to try and help them gain users and improve their websites but it does offer helpful tools to anyone running a web study. Google Analytics is not made for individual user tracking, rather it gives an overview of use of your webpage.

There are many alternative frameworks available depending on the types of tracking you would like to do. One commonly recommended framework as an alternative to Google Analytics is Piwik http://piwik.org/ which is open source. Piwik can be installed to your own servers so you have control over gathered data.

Designing and running

For this topic review, we will discuss and perform an exercise on designing and running an open-ended web study. We will be tracking user behaviour online. Our sample study tests for the best location to put an ad in a person's newsfeed by tracking the following:

1. which ads get clicked more.

2.which ads get hovered in more often.

3.how long it takes before a person clicks on an ad after activating the page

A zip file containing code to run our basic webpage with tracking https://www.dropbox.com/s/6a670d7ght3zojl/Web%20Tutorial.zip?dl=0

References

1.Eytan Bakshy, Dean Eckles, and Michael S. Bernstein. 2014. Designing and deploying online field experiments. Proceedings of the 23rd international conference on World wide web - WWW ’14, ACM Press, 283–292. http://doi.org/10.1145/2566486.2567967

2.Robert M Bond, Christopher J Fariss, Jason J Jones, et al. 2012. A 61-million-person experiment in social influence and political mobilization. Nature 489, 7415: 295–8. http://doi.org/10.1038/nature11421

3.Jeremy Boy, Jean-Daniel Fekete, Françoise DETIENNE, et al. 2015. Storytelling in Information Visualizations: Does it Engage Users to Explore Data? Proceedings of the 33rd ACM Conference on Human Factors in Computing Systems (CHI 2015).

4.Nicholas A. Christakis and James H. Fowler. 2011. Social Contagion Theory: Examining Dynamic Social Networks and Human Behavior. Retrieved October 2, 2015 from http://arxiv.org/abs/1109.5235

5.Marian Dörk, Daniel Gruen, Carey Williamson, and Sheelagh Carpendale. 2010. A visual backchannel for large-scale events. IEEE transactions on visualization and computer graphics 16, 6: 1129–38. http://doi.org/10.1109/TVCG.2010.129

6.Marian Dörk, Carey Williamson, and Sheelagh Carpendale. 2012. Navigating tomorrow’s web. ACM Transactions on the Web 6, 3: 1–28. http://doi.org/10.1145/2344416.2344420

7.Steve Haroz, Robert Kosara, and Steven L. Franconeri. 2015. ISOTYPE Visualization – Working Memory, Performance, and Engagement with Pictographs. Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems - CHI ’15, ACM Press, 1191–1200. http://doi.org/10.1145/2702123.2702275

8.Julia Kiseleva, Melanie J.I. Mueller, Lucas Bernardi, et al. 2015. Where to Go on Your Next Trip? Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval - SIGIR ’15, ACM Press, 1097–1100. http://doi.org/10.1145/2766462.2776777

9.Ron Kohavi, Alex Deng, Roger Longbotham, and Ya Xu. 2014. Seven rules of thumb for web site experimenters. Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, 1857–1866.

10.Adam D I Kramer, Jamie E Guillory, and Jeffrey T Hancock. 2014. Experimental evidence of massive-scale emotional contagion through social networks. Proceedings of the National Academy of Sciences of the United States of America 111, 24: 8788–90. http://doi.org/10.1073/pnas.1320040111

11.Tanushree Mitra, C.J. Hutto, and Eric Gilbert. 2015. Comparing Person- and Process-centric Strategies for Obtaining Quality Data on Amazon Mechanical Turk. Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems - CHI ’15, ACM Press, 1345–1354. http://doi.org/10.1145/2702123.2702553

12.Kerry Rodden, Hilary Hutchinson, and Xin Fu. 2010. Measuring the user experience on a large scale. Proceedings of the 28th international conference on Human factors in computing systems - CHI ’10, ACM Press, 2395. http://doi.org/10.1145/1753326.1753687

13.Andrew F Tappenden and James Miller. 2009. Cookies: A deployment study and the testing implications. ACM Transactions on the Web (TWEB) 3, 3: 9.

14.Steve Whittaker, Loren Terveen, Will Hill, and Lynn Cherny. 1998. The dynamics of mass interaction. Proceedings of the 1998 ACM conference on Computer supported cooperative work - CSCW ’98, ACM Press, 257–264. http://doi.org/10.1145/289444.289500

15.Rebecca Xiong and Judith Donath. 1999. PeopleGarden. Proceedings of the 12th annual ACM symposium on User interface software and technology - UIST ’99, ACM Press, 37–44. http://doi.org/10.1145/320719.322581

16. Richard Atterer, Monika Wnuk, and Albrecht Schmidt. 2006. Knowing the user’s every move: user activity tracking for website usability evaluation and implicit interaction. Proceedings of the 15th international conference on World Wide Web - WWW ’06, ACM Press, 203–212.
http://dx.doi.org/10.1145/1135777.1135811

17. English, William K., Douglas C. Engelbart, and Melvyn L. Berman. "Display-selection techniques for text manipulation." Human Factors in Electronics, IEEE Transactions on 1 (1967): 5-15.

Resources

Web Deployment presentation slides

A zip file containing code to run our basic webpage with tracking https://www.dropbox.com/s/6a670d7ght3zojl/Web%20Tutorial.zip?dl=0