Andreas Weigend | Social Data Revolution | Fall 2012
School of Information | University of California at Berkeley | INFO 290A-3

Class 1: August 27, 2012

Responsible for initial page (up by 10pm on Thursday after class):

Class materials:

The Social Data Revolution

Definition of Social Data

Social Data denotes:
  • Relationships extracted from communication behavior between individuals ("the social graph", e.g., derived from phone calls or comments etc on Facebook)
  • Information that is shared ("socialized") on social media platforms or elsewhere (e.g., real-time location shared on via Google Latitude)

Definition of Social Data Revolution

The Social Data Revolution (SDR) is a shift in mindset of individuals towards data they knowingly and voluntarily share with a potentially large audience. It impacts their notion of self, as well of relationships, both with other individuals, and with companies and organizations.
Individuals are increasingly willing to create and share data and they increasingly expect to get some value in exchange. Not only does it change how we view ourselves, our friendships, and our personal decision making, but also the way in which businesses look at existing sectors. In short, SDR has created new social norms.
(See also the abstract and slides at

Application of Social Data (Example): The Evolution of Recommendation Systems

There are several stages in recommendations systems that can be characterized by the date they use:
  • Merchandizing
    • Merchandizing is the practice where humans, such as sales persons, figure out what they think will sell well for a company without using data.
  • Clicks
    • Clicks are the collective intelligence of the customers, where the customers think together what items they should buy or what they might buy in addition. Amazon ( uses data generated through "Clicks" to analyze the behavior of their customers.
      1. Purchases. Purchase data tells us what a customer actually bought.
      2. Item Viewed. Item Viewed tells us about the substitute of products that a customer has considered buying.
    • By looking at Purchases and the Item Viewed together, Amazon can tell what people will eventually buy. This example demonstrate how people generate Consumer to Business(C2B) Data using Amazon's online application. The "Clicks" are implicit social data.
  • Reviews
    • Reviews are explicit social data created by people when they discuss about a product. Reviews are Consumer to World(C2W) type of data since one person writes a review and anyone can read it.
    • One major question regarding the Reviews is how do we know which review we can trust? To address this issue, Amazon, for example, displays the reviewer's real name, which is obtained from the credit card information collected via the reviewer's purchases. Using real names helps to reduce fraud and increase trust towards the reviews.
  • Social Graph or Connections
    • This is the type of data that can be collected from the social graph. For example, people recommend products to each other through social relations, such as a referral program. People tend to share products they purchased and/or interested with others who are connected with them.

Innovating and Evaluating Recommendation System

Out of many recommendations system for business to utilize social data, which one should a company use? How can a recommendation system be evaluated? The answer is using the PHAME framework.

PHAME Methodology

PHAME is a framework ("PHAMEwork") for innovation based on the scientific method:
  • P: Problem - Define the Problem
  • H: Hypothesis - Create Hypotheses
  • A: Action - Suggest different Actions
  • M: Metrics - Combine measurements into Metrics
  • E: Experiment - Conduct Experiments

Example of PHAME Application: co-branded credit card

A revenue source for is to get customers to sign up for a credit card where Amazon gets a $100 for referring customers to Chase and an additional $30 to pass on to the customer.
Amazon came up with two competing hypotheses:
  • Give customers $30 towards their next purchase so they will think about Amazon and become a repeat customer.
  • Give customers $30 now so they'll sign up now.

Amazon ran an experiment where half of their customers were offered to receive the $30 right away, and the other half were offered to receive the money later. Significantly more customers signed up when the money was offered immediately.

First Building Block of Social Data: Communication

Communication Channels:

There are many existing channels for communication nowadays:
  • E-mail
  • Web-chat
  • IM
  • Facebook
  • Phone
  • Mail
  • Fax (still used for some official communications)
  • SMS/Text Messages
  • VoIP (Skype, etc)
  • Messaging services on Phones (e.g. Whatsapp, WeChat)
  • please add others

Communication Dimensions:

Each communication channel can have a variety of dimensions that characterize the channel
  • Synchronous / Asynchronous - Communication happens simultaneously (synchronous) or request and reply happen at different time (asynchronous).
  • Probability of response - Probability of receiving a response from the other party after initiating communication.
  • Time to send a message - Time taken for the sender to send the message.
  • Time of response - Time taken for the receiver of the message to respond.
  • Effectiveness of Communication - How effectively the meaning of the message is perceived by the receiving end.
  • Persistence / Ephemeral- The record of communication is persisted. E.g. Government records every email exchanged between high level officials.
  • Ephemeral - The record of communication is NOT persisted and is lost when communication is finished. E.g a conversation between two persons on a street.
  • Formal/ Informal - Form of communication that either conforms established professional rules, standards and processes, and avoids using slangs (Formal Communication), or ignores established rules (Informal Communication).
  • Signal to Noise ratio - The level of desired signal compared to background noise in communication. (Signal-to-noise ratio)
  • Anonymity - Whether the identities of the communicating parties are known or not.
  • Proximity - Are the communicating parties located close to each other or not.
  • Cost for the sender - Cost for sending a message.
  • Cost for the receiver - Cost for receiving a message.
  • Effort - Effort (time, money, resource) spent on establishing the communication.
  • Accuracy - Whether is the message transmitted correctly without loss of information.
  • Architecture - The type of architecture of the communication channel. For example, over the network stacks.
  • Trust - Whether the communicating parties can trust each other's identity.

Second Building Block of Social Data: Identity

Guest Speaker: Prof. Quentin Hardy from I School @ UC Berkeley

Quentin's Perspective on Identity:

Characters of identity

  • Identity is the sense of where you come from, what you doing today, what you want people to perceive you as.
  • Which becomes true, how you see the world, or how the world sees you?
  • Identity is created by looking at how you react to the world and how world reacts to you.
  • Identity is fragile, non-linear and changes over time. You rewrite your identity as you obtain new information from the world around you.
  • On the social graph, one's identity is in relation with other's identities. Identities are derived from different patterns and affiliations.
  • People are obsessed with technology and rely on it to make decisions, thus identities are manipulated by technology.
  • Identity is created with urgency and desire. It is a story in a finite life, seeking for self, seeking for love. Software cannot deliver identity as software does not care to end.
  • People have two sets of identities : Internal Identity and External Identity. People interact with the world with external identity. The reaction from the world changes people's internal identity.
  • People's internal identity is increasingly externalized.

Value of identity:

  • Identity becomes more valuable when the person has more control over the situation. e.g seeking social status, seeking power.
  • A successful product gives people a sense of control and identity.

Andreas' Perspective on Identity (recent video from Predictive Analytics world, see HW1 in Assignments)

  • Identity is the pillar of social data. One key ingredient of Identity is persistency.
  • Customer centered identity is a symmetrical identity. When symmetrical identity is adopted, it changes the behavior of the customer and the company.
  • Company will be successful if they create symmetrical identities with their customers, so that customers know who they are dealing with and the company knows who the customer is.
  • Value of the identity depends on the amount of changes that an identity has in making decisions. For example, name on the passport determines how easy you can enter/leave the country.
  • There is a shift in defining identity, from personal attributes such as name and date of birth, to relations of identities.
  • An important issue is how to deal with incorrect data about our identities, and how to fix the mistakes in our identities.


As example, see last year's Stanford SDR course wiki but please note that the individual classes were only 75 minutes (whereas they are 3 hours here at Cal).