Linkage analysis on social networking data

Film poster in: Network (1976)

The mass adoption of Online Social Networks is considered as a spark that burst the Big Data era and arise great opportunities to the understanding of most socio-economic phenomena in the modern world. Social networks is a rich source of content and linkage data that have been widely utilized in data analysis. Linkage data are commonly analyzed via social network analysis.

In this post you’ll read about:

  1. What is a Graph?
  2. What is Social Network Analysis?
  3. What are some analysis questions answered via SNA?
  4. What are some applications of social network analysis in Big Data era?
  5. Why social networks are studied by industry and academia?
  6. How to extract data from online social networks?

Realize that everything connects to everything else. Learn to see.
Leonardo Da Vinci

What is a Graph?

A network is a collection of things that are connected to one another. Network concepts and techniques are widely found throughout a range of disciplines; the entire world around us poses a network structure. Economy, human cell, traffic and roads, society, internet, food webs, media and information all have the structure of a network. In the most basic framework the network is modelled as a graph G = (V, E) where V is a set of nodes (the things) and E is a set of edges that connects the nodes (the relation between the things).

What is Social Network Analysis?

SNA is important if one wants to understand the structure of the network so as to gain insights about how the network “works” and make decisions upon it by either examining node/link characteristics (e.g. centrality) or by looking metrics at the whole network cohesion(e.g. density).

Graph theory and statistics include the core prominent approaches in SNA. The former used to investigate topology-structures analytically and visually. Descriptive and inferential statistics on the other hand is used to investigate concerns about sampling from large networks and the reliability of observations.

The definition of “description” in this context is:
‘What is the distribution of nodes, attributes, and edges and what is the shape of the distribution?’
The question of “inference” is:
‘How much confidence can I have that the pattern I see in the data I’ve collected is actually typical of some larger population, or that the apparent pattern is not really just a random occurrence?’1

There are two ways to conduct a social network analysis:
(i) Batch, which presumes that social network changes gradually over time.
(ii) Dynamic, which encompasses streaming data that are evolving at high speed.
The second used more often to analyze interaction data between node entities whereas static analysis used often to deal with network properties like connectivity, density, degree, diameter and geodesic distance.

What are some analysis questions answered via SNA ?

The first thing that one suspects is that with SNA you can answer who follows or replies to whom on Facebook. In a more detail view, analysis tasks of SNA include the following:

  • Network Structure: Why and how did it come to have such structure?
  • Processes and dynamics: How do information, behavior, and diseases spread?
  • Community Analysis: What are the communities in the social network?
  • Path Analytics: What is the shortest path between two nodes e.g. find the best possible route for traffic optimization in smart cities?
  • Connectivity Analytics: What are the connectivity patterns of edges e.g. find who are the ‘listeners’ in a social network?
  • Centrality Analytics: What are the important nodes regarding to a specific analysis problem e.g. find who are the influencers colleagues

What are some applications of social network analysis in Big Data era 2?

• Ranking expertise: Knowledgeable and influential Human Resources appropriate for a project
• Recommendation
• Customer Behavior Sequence Analytics
• Financial analysis
• Social media monitoring
• Anomaly Detection (Espionage, Sabotage, etc.)
• Fraud Detection
• Web page ranking
• Intelligent computing, e.g.IBM Watson graph matching to allocate symptoms to a disease.
• Visualization of social roles, as depicted below.

IntelligentGraph

One of the main obstacles in the area of Big Data is that since the network consists of millions or billions of connected objects, SNA becomes a computationally intensive affair.

Why social networks are studied by industry and academia?

Online Social Networks (OSN) is the first data source that captures (i) how people are connected together and affect the perspectives of individuals, groups and organizations around the globe and (ii) how, as a whole, they create a culture, a trend, a financial market, a government, a company, and other social structures. OSN is the only data source that reflects a miniature of a multicultural society hosting an unprecedented scale of personal data, data about events and social relationships, public sentiments and behaviours.
The unique element of these data is their networked nature since they hold information about interactions among users-communities-content. The analysis of the distribution pattern of these interactions usually achieved via Social Network Analysis (SNA) which draws conclusions about the network as a whole or about those belonging to it.

OSN’s have caused a huge shift on how businesses operate and compete, how government act and influence and how people communicate and share knowledge.
To understand that impact, look no further than how political and marketing campaigns analyse opinionated text, comments and multimedia content that is shared in the network they care about.

Businesses apply social network analysis to gain insight into markets. Multinational companies apply social network analysis to understand cultural differences in order to accordingly change their strategy. Analysing text posted by diverse cultures is a perfect attachment to PEST analysis for those who are interested in strategic analysis.
Governments apply social network analysis for law enforcement. For instance, NSA uses it to map terrorist networks.

In addition to that, consider the experience of movie rental which has become a service, let’s say offered by Netflix, that utilizes a vast array of data points to generate recommendations. This array usually contains information about the relationship of a user with other users because of the assumption that people with common ‘liked movies’ may share a common cinephilia.

Keep in mind that data from OSN are mostly about people and thus ethical and privacy concerns are highly arise when making an analysis.

How to extract data from online social networks?

Public APIs (Application Provider Interface) are the standard mean of retrieving social networking data from cloud and they typically encourage the development of third-party software—for example, a plugin for WordPress. One alternative is to use commercial tools for scrapping that protect raw data or that have some extra filtering functionality. For instance, in ‘Sociopedia: An Interactive System for Event Detection and Trend Analysis for Twitter Data’3 is used Sysomos4, a social monitoring tool, to detect specific events. Sysomos is also one of the tools used at the BBC for monitoring social media and website activities 5. Another alternative is to use the combination of API functionalities and a crawler as in the data mining tool TweCom 6 is done. A crawler is built to extract information that are not automatically extracted with the API. Importantly, though, each social platform has very specific rules around how on to use their respective data that can be found in the Terms of Service. Although, most of OSNs expose an API which includes methods to get a range of data, they limit the number of API transaction per day.

In bottom line, social network analysis studies the network formulation and evolution including the characteristics of its connected nodes-things. It founds many applications in big data such as recommendation and ranking systems; it still remains a computationally intensive task, though. Online social networks is a significant big data source whose linkage data can be analyzed through social network analysis.







Kommentare: