Data Mining

Data mining is the process of discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems.

Personal Web Revisitation by Context and Content Keywords with Relevance Feedback

Personal Web Revisitation by Context and Content Keywords with Relevance Feedback Introduction: Getting back to previously viewed web pages is a common yet uneasy task for users due to the large volume of personally accessed information on the web. This paper leverages human’s natural recall process of using episodic and semantic memory cues to facilitate …

Personal Web Revisitation by Context and Content Keywords with Relevance Feedback Read More »

PPRank: Economically Selecting Initial Users for Influence Maximization in Social Networks

PPRank: Economically Selecting Initial Users for Influence Maximization in Social Networks Introduction: This paper focuses on seeking a new heuristic scheme for an influence maximization problem in social networks: how to economically select a subset of individuals (so-called seeds) to trigger a large cascade of further adoptions of a new behavior based on a contagion …

PPRank: Economically Selecting Initial Users for Influence Maximization in Social Networks Read More »

QDA: A Query Driven Approach to Entity Resolution

QDA: A Query Driven Approach to Entity Resolution Introduction: This paper addresses the problem of query-aware data cleaning in the context of a user query. In particular, we develop a novel Query-Driven Approach (QDA) that systematically exploits the semantics of the predicates in SQL-like selection queries to reduce the data cleaning overhead. The objective of …

QDA: A Query Driven Approach to Entity Resolution Read More »

Query Expansion with Enriched User Profiles for Personalized Search Utilizing Folksonomy Data

Query Expansion with Enriched User Profiles for Personalized Search Utilizing Folksonomy Data Introduction: Query expansion has been widely adopted in Web search as a way of tackling the ambiguity of queries. Personalized search utilizing folksonomy data has demonstrated an extreme vocabulary mismatch problem that requires even more effective query expansion methods. Co-occurrence statistics, tag-tag relationships …

Query Expansion with Enriched User Profiles for Personalized Search Utilizing Folksonomy Data Read More »

RAPARE: A Generic Strategy for Cold Start Rating Prediction Problem

RAPARE: A Generic Strategy for Cold Start Rating Prediction Problem Introduction: In recent years, recommender system is one of indispensable components in many e-commerce websites. One of the major challenges that largely remains open is the cold-start problem, which can be viewed as a barrier that keeps the cold-start users/items away from the existing ones. …

RAPARE: A Generic Strategy for Cold Start Rating Prediction Problem Read More »

SociRank: Identifying and Ranking Prevalent News Topics Using Social Media Factors

SociRank: Identifying and Ranking Prevalent News Topics Using Social Media Factors Introduction: Mass media sources, specifically the news media, have traditionally informed us of daily events. In modern times, social media services such as Twitter provide an enormous amount of user-generated data, which have great potential to contain informative news-related content. For these resources to …

SociRank: Identifying and Ranking Prevalent News Topics Using Social Media Factors Read More »

Towards Real-Time, Country-Level Location Classification of Worldwide Tweets

Towards Real-Time, Country-Level Location Classification of Worldwide Tweets Introduction: The increase of interest in using social media as a source for research has motivated tackling the challenge of automatically geolocating tweets, given the lack of explicit location information in the majority of tweets. In contrast to much previous work that has focused on location classification …

Towards Real-Time, Country-Level Location Classification of Worldwide Tweets Read More »

Trajectory Community Discovery and Recommendation by Multisource Diffusion Modeling

Trajectory Community Discovery and Recommendation by Multisource Diffusion Modeling Introduction: In this paper, we detect communities from trajectories. Existing algorithms for trajectory clustering usually rely on simplex representation and a single proximity-related metric. Unfortunately, additional information markers (e.g., social interactions or semantics in the spatial layout) are ignored, leading to the inability to fully discover …

Trajectory Community Discovery and Recommendation by Multisource Diffusion Modeling Read More »

user centric similarity search

User-Centric Similarity Search

User Centric Similarity Search Introduction: User preferences play a significant role in market analysis. In the database literature there has been extensive work on query primitives, such as the well known top-k query that can be used for the ranking of products based on the preferences customers have expressed. Still, the fundamental operation that evaluates …

User-Centric Similarity Search Read More »

User Vitality Ranking and Prediction

User Vitality Ranking and Prediction in Social Networking Services: a Dynamic Network Perspective

User Vitality Ranking and Prediction in Social Networking Services: a Dynamic Network Perspective Introduction: Social networking services have been prevalent at many online communities such as Twitter.com and Weibo.com, where millions of users keep interacting with each other every day. One interesting and important problem in the social networking services is to rank users based …

User Vitality Ranking and Prediction in Social Networking Services: a Dynamic Network Perspective Read More »

Understand Short Texts by Harvesting and Analyzing Semantic Knowledge

Understand Short Texts by Harvesting and Analyzing Semantic Knowledge Introduction: Understanding short texts is crucial to many applications, but challenges abound. First, short texts do not always observe the syntax of a written language. As a result, traditional natural language processing tools, ranging from part-of-speech tagging to dependency parsing, cannot be easily applied. Second, short …

Understand Short Texts by Harvesting and Analyzing Semantic Knowledge Read More »

An Iterative Classification Scheme for Sanitizing Large-Scale Datasets

An Iterative Classification Scheme for Sanitizing Large-Scale Datasets Introduction: Cheap ubiquitous computing enables the collection of massive amounts of personal data in a wide variety of domains. Many organizations aim to share such data while obscuring features that could disclose personally identifiable information. Much of this data exhibits weak structure (e.g., text), such that machine …

An Iterative Classification Scheme for Sanitizing Large-Scale Datasets Read More »

Analyzing Sentiments in One Go: A Supervised Joint Topic Modeling Approach

Analyzing Sentiments in One Go: A Supervised Joint Topic Modeling Approach Introduction: In this work, we focus on modeling user-generated review and overall rating pairs, and aim to identify semantic aspects and aspect-level sentiments from review data as well as to predict overall sentiments of reviews. We propose a novel probabilistic supervised joint aspect and …

Analyzing Sentiments in One Go: A Supervised Joint Topic Modeling Approach Read More »

Collaborative Filtering-Based Recommendation of Online Social Voting

Collaborative Filtering-Based Recommendation of Online Social Voting Introduction: Social voting is an emerging new feature in online social networks. It poses unique challenges and opportunities for recommendation. In this paper, we develop a set of matrix factorization (MF) and nearest-neighbor (NN)-based recommender systems (RSs) that explore user social network and group affiliation information for social …

Collaborative Filtering-Based Recommendation of Online Social Voting Read More »

Computing Semantic Similarity of Concepts in Knowledge Graphs

Computing Semantic Similarity of Concepts in Knowledge Graphs Introduction: This paper presents a method for measuring the semantic similarity between concepts in Knowledge Graphs (KGs) such as WordNet and DBpedia. Previous work on semantic similarity methods have focused on either the structure of the semantic network between concepts (e.g. path length and depth), or only …

Computing Semantic Similarity of Concepts in Knowledge Graphs Read More »

Detecting Stress Based on Social Interactions in Social Networks

Detecting Stress Based on Social Interactions in Social Networks Introduction: Psychological stress is threatening people’s health. It is non-trivial to detect stress timely for proactive care. With the popularity of social media, people are used to sharing their daily activities and interacting with friends on social media platforms, making it feasible to leverage online social …

Detecting Stress Based on Social Interactions in Social Networks Read More »

Dynamic Facet Ordering for Faceted Product Search Engines

Dynamic Facet Ordering for Faceted Product Search Engines Introduction: Faceted browsing is widely used in Web shops and product comparison sites. In these cases, a fixed ordered list of facets is often employed. This approach suffers from two main issues. First, one needs to invest a significant amount of time to devise an effective list. …

Dynamic Facet Ordering for Faceted Product Search Engines Read More »

Efficient Clue-based Route Search on Road Networks

Efficient Clue-based Route Search on Road Networks Introduction: With the advances in geo-positioning technologies and location-based services, it is nowadays quite common for road networks to have textual contents on the vertices. Previous work on identifying an optimal route that covers a sequence of query keywords has been studied in recent years. However, in many …

Efficient Clue-based Route Search on Road Networks Read More »

Efficient Keyword-aware Representative Travel Route Recommendation

Efficient Keyword-aware Representative Travel Route Recommendation Introduction: With the popularity of social media (e.g., Facebook and Flicker), users can easily share their check-in records and photos during their trips. In view of the huge number of user historical mobility records in social media, we aim to discover travel experiences to facilitate trip planning. When planning …

Efficient Keyword-aware Representative Travel Route Recommendation Read More »

Energy efficient query processing in Web Search Engines

Energy efficient query processing in Web Search Engines Introduction: Web search engines are composed by thousands of query processing nodes, i.e., servers dedicated to process user queries. Such many servers consume a significant amount of energy, mostly accountable to their CPUs, but they are necessary to ensure low latencies, since users expect sub-second response times …

Energy efficient query processing in Web Search Engines Read More »

Generating Query Facets using Knowledge Bases

Generating Query Facets using Knowledge Bases Introduction: A query facet is a significant list of information nuggets that explains an underlying aspect of a query. Existing algorithms mine facets of a query by extracting frequent lists contained in top search results. The coverage of facets and facet items mined by this kind of methods might …

Generating Query Facets using Knowledge Bases Read More »

Influential Node Tracking on Dynamic Social Network: An Interchange Greedy Approach

Influential Node Tracking on Dynamic Social Network: An Interchange Greedy Approach Introduction: As both social network structure and strength of influence between individuals evolve constantly, it requires to track the influential nodes under a dynamic setting. To address this problem, we explore the Influential Node Tracking (INT) problem as an extension to the traditional Influence …

Influential Node Tracking on Dynamic Social Network: An Interchange Greedy Approach Read More »

l-Injection: Toward Effective Collaborative Filtering Using Uninteresting Items

l-Injection: Toward Effective Collaborative Filtering Using Uninteresting Items Introduction: We develop a novel framework, named as l-injection, to address the sparsity problem of recommender systems. By carefully injecting low values to a selected set of unrated user-item pairs in a user-item matrix, we demonstrate that top-N recommendation accuracies of various collaborative filtering (CF) techniques can …

l-Injection: Toward Effective Collaborative Filtering Using Uninteresting Items Read More »

Mining Competitors from Large Unstructured Datasets

Mining Competitors from Large Unstructured Datasets Introduction: In any competitive business, success is based on the ability to make an item more appealing to customers than the competition. A number of questions arise in the context of this task: how do we formalize and quantify the competitiveness between two items? Who are the main competitors …

Mining Competitors from Large Unstructured Datasets Read More »

Modeling Information Diffusion over Social Networks for Temporal Dynamic Prediction

Modeling Information Diffusion over Social Networks for Temporal Dynamic Prediction Introduction: How to model the process of information diffusion in social networks is a critical research task. Although numerous attempts have been made for this study, few of them can simulate and predict the temporal dynamics of the diffusion process. To address this problem, we …

Modeling Information Diffusion over Social Networks for Temporal Dynamic Prediction Read More »

Topic Rehotting Prediction : When to Make a Topic Popular Again? A Temporal Model for Topic Rehotting Prediction in Online Social Networks

Topic Rehotting Prediction in Online Social Networks Topic rehotting prediction is popular technique in social networks. It is really popular to detect hot topics, which can benefit many tasks including topic recommendations, the guidance of public opinions, and so on. However, in some cases, people may want to know when to re-hot a topic, i.e., …

Topic Rehotting Prediction : When to Make a Topic Popular Again? A Temporal Model for Topic Rehotting Prediction in Online Social Networks Read More »