All posts by SmartBit Infotech

About SmartBit Infotech

Software company in Pune, Maharashtra

An Overlay Architecture for Throughput Optimal Multipath Routing

Introduction :

Legacy networks are often designed to operate with simple single-path routing, like the shortest path, which is known to be throughput suboptimal. On the other hand, previously proposed throughput optimal policies (i.e., backpressure) require every device in the network to make dynamic routing decisions. In this paper, we study an overlay architecture for dynamic routing, such that only a subset of devices (overlay nodes) need to make the dynamic routing decisions. We determine the essential collection of nodes that must bifurcate traffic for achieving the maximum multi-commodity network throughput. We apply our optimal node placement algorithm to several graphs and the results show that a small fraction of overlay nodes is sufficient for achieving maximum throughput. Finally, we propose a threshold-based policy (BP-T) and a heuristic policy (OBP), which dynamically control traffic bifurcations at overlay nodes. Policy BP-T is proved to maximize throughput for the case when underlay paths do no overlap. In all studied simulation scenarios, OBP not only achieves full throughput but also reduces delay in comparison to the throughput optimal backpressure routing.

Reference IEEE paper :

“An Overlay Architecture for Throughput Optimal Multipath Routing”, IEEE/ACM TRANSACTIONS ON NETWORKING, 2017.

UniqueID – SBI1054

DomainNETWORKING

Book your project Now.  Checkout other projects here

User Centric Similarity Search

Introduction:

User preferences play a significant role in market analysis. In the database literature there has been extensive work on query primitives, such as the well known top-k query that can be used for the ranking of products based on the preferences customers have expressed. Still, the fundamental operation that evaluates the similarity between products is typically done ignoring these preferences. Instead products are depicted in a feature space based on their attributes and similarity is computed via traditional distance metrics on that space. In this work we utilize the rankings of the products based on the opinions of their customers in order to map the products in a user-centric space where similarity calculations are performed. We identify important properties of this mapping that result in upper and lower similarity bounds, which in turn permit us to utilize conventional multidimensional indexes on the original product space in order to perform these user-centric similarity computations. We show how interesting similarity calculations that are motivated by the commonly used range and nearest neighbor queries can be performed efficiently, while pruning significant parts of the data set based on the bounds we derive on the user-centric similarity of products.

Reference IEEE paper:

“User-Centric Similarity Search”, IEEE Transactions on Knowledge and Data Engineering, 2017.

Unique ID – SBI1052

DomainDATA MINING

Book your project Now.  Checkout other projects here

User Vitality Ranking and Prediction in Social Networking Services: a Dynamic Network Perspective

Introduction:

Social networking services have been prevalent at many online communities such as Twitter.com and Weibo.com, where millions of users keep interacting with each other every day. One interesting and important problem in the social networking services is to rank users based on their vitality in a timely fashion. An accurate ranking list of user vitality could benefit many parties in social network services such as the ads providers and site operators. Although it is very promising to obtain a vitality-based ranking list of users, there are many technical challenges due to the large scale and dynamics of social networking data. In this paper, we propose a unique perspective to achieve this goal, which is quantifying user vitality by analyzing the dynamic interactions among users on social networks. Examples of social network include but are not limited to social networks in microblog sites and academical collaboration networks. Intuitively, if a user has many interactions with his friends within a time period and most of his friends do not have many interactions with their friends simultaneously, it is very likely that this user has high vitality. Based on this idea, we develop quantitative measurements for user vitality and propose our first algorithm for ranking users based vitality. Also we further consider the mutual influence between users while computing the vitality measurements and propose the second ranking algorithm, which computes user vitality in an iterative way. Other than user vitality ranking, we also introduce a vitality prediction problem, which is also of great importance for many applications in social networking services. Along this line, we develop a customized prediction model to solve the vitality prediction problem. To evaluate the performance of our algorithms, we collect two dynamic social network data sets. The experimental results with both data sets clearly demonstrate the advantage of our ranking and prediction methods.

Reference IEEE paper:

“User Vitality Ranking and Prediction in Social Networking Services: a Dynamic Network Perspective”, IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 2017.

Unique ID – SBI1051

DomainDATA MINING

Book your project Now.  Checkout other projects here

Understand Short Texts by Harvesting and Analyzing Semantic Knowledge

Introduction:
Understanding short texts is crucial to many applications, but challenges abound. First, short texts do not always observe the syntax of a written language. As a result, traditional natural language processing tools, ranging from part-of-speech tagging to dependency parsing, cannot be easily applied. Second, short texts usually do not contain sufficient statistical signals to support many state-of-the-art approaches for text mining such as topic modelling. Third, short texts are more ambiguous and noisy, and are generated in an enormous volume, which further increases the difficulty to handle them. We argue that semantic knowledge is required in order to better understand short texts. In this work, we build a prototype system for short text understanding which exploits semantic knowledge provided by a well-known knowledge base and automatically harvested from a web corpus. Our knowledge-intensive approaches disrupt traditional methods for tasks such as text segmentation, part-of-speech tagging, and concept labelling, in the sense that we focus on semantics in all these tasks. We conduct a comprehensive performance evaluation on real-life data. The results show that semantic knowledge is indispensable for short text understanding, and our knowledge-intensive approaches are both effective and efficient in discovering semantics of short texts.

Reference IEEE paper:
“Understand Short Texts by Harvesting and Analyzing Semantic Knowledge”, IEEE Transactions on Knowledge and Data Engineering, 2017.

Unique ID -SBI1050

DomainDATA MINING

Book your project Now.  Checkout other projects here

RAAC: Robust and Auditable Access Control with Multiple Attribute Authorities for Public Cloud Storage

Introduction:
Data access control is a challenging issue in public cloud storage systems. Cipher text-Policy Attribute-Based Encryption (CP-ABE) has been adopted as a promising technique to provide flexible, fine-grained and secure data access control for cloud storage with honest-but-curious cloud servers. However, in the existing CP-ABE schemes, the single attribute authority must execute the time-consuming user legitimacy verification and secret key distribution, and hence it results in a single-point performance bottleneck when a CP-ABE scheme is adopted in a large-scale cloud storage system. Users may be stuck in the waiting queue for a long period to obtain their secret keys, thereby resulting in low-efficiency of the system. Although multi authority access control schemes have been proposed, these schemes still cannot overcome the drawbacks of single-point bottleneck and low efficiency, due to the fact that each of the authorities still independently manages a disjoint attribute set. In this paper, we propose a novel heterogeneous framework to remove the problem of single-point performance bottleneck and provide a more efficient access control scheme with an auditing mechanism. Our framework employs multiple attribute authorities to share the load of user legitimacy verification. Meanwhile, in our scheme, a CA (Central Authority) is introduced to generate secret keys for legitimacy verified users. Unlike other multiauthority access control schemes, each of the authorities in our scheme manages the whole attribute set individually. To enhance security, we also propose an auditing mechanism to detect which AA (Attribute Authority) has incorrectly or maliciously performed the legitimacy verification procedure. Analysis shows that our system not only guarantees the security requirements but also makes great performance improvement on key generation.

Reference IEEE paper:
“RAAC: Robust and Auditable Access Control with Multiple Attribute Authorities for Public Cloud Storage”, IEEE Transactions on Information Forensics and Security, 2017.

Unique ID -SBI1022

DomainCLOUD COMPUTING

Book your project Now.  Checkout other projects here

Secure Data Sharing in Cloud Computing Using Revocable Storage Identity Based Encryption

Introduction:
Cloud computing provides a flexible and convenient way for data sharing, which brings various benefits for both the society and individuals. But there exists a natural resistance for users to directly outsource the shared data to the cloud server. Since the data often contain valuable information. Thus, it is necessary to place cryptographically enhanced access control on the shared data. Identity-based encryption is a promising crypto graphical primitive to build a practical data sharing system. However, access control is not static. That is, when some user’s authorization is expired, there should be a mechanism that can remove him/her from the system. Consequently, the revoked user cannot access both the previously and subsequently shared data. To this end, we propose a notion called revocable-storage identity-based encryption (RS-IBE), which can provide the forward/backward security of cipher text by introducing the functionalities of user revocation and cipher text update simultaneously. Furthermore, we present a concrete construction of RS-IBE, and prove its security in the defined security model. The performance comparisons indicate that the proposed RS-IBE scheme has advantages in terms of functionality and efficiency, and thus is feasible for a practical and cost-effective data-sharing system. Finally, we provide implementation results of the proposed scheme to demonstrate its practicability.

Reference IEEE paper:
“Secure Data Sharing in Cloud Computing Using Revocable-Storage Identity-Based Encryption”, IEEE Transactions on Cloud Computing 2017.

Unique ID -SBI1023

DomainCLOUD COMPUTING

Book your project Now.  Checkout other projects here

Securing Cloud Data under Key Exposure

Introduction:
Recent news reveal a powerful attacker which breaks data confidentiality by acquiring cryptographic keys, by means of coercion or backdoors in cryptographic software. Once the encryption key is exposed, the only viable measure to preserve data confidentiality is to limit the attacker’s access to the cipher text. This may be achieved, for example, by spreading cipher text blocks across servers in multiple administrative domains—thus assuming that the adversary cannot compromise all of them. Nevertheless, if data is encrypted with existing schemes, an adversary equipped with the encryption key, can still compromise a single server and decrypt the cipher text blocks stored therein. In this paper, we study data confidentiality against an adversary which knows the encryption key and has access to a large fraction of the cipher text blocks. To this end, we propose Bastion, a novel and efficient scheme that guarantees data confidentiality even if the encryption key is leaked and the adversary has access to almost all cipher text blocks. We analyze the security of Bastion, and we evaluate its performance by means of a prototype implementation. We also discuss practical insights with respect to the integration of Bastion in commercial dispersed storage systems. Our evaluation results suggest that Bastion is well-suited for integration in existing systems since it incurs less than 5% overhead compared to existing semantically secure encryption modes.

Reference IEEE paper:
“Securing Cloud Data under Key Exposure”, IEEE Transactions on Cloud Computing, 2017.

Unique ID -SBI1024

DomainCLOUD COMPUTING

Book your project Now.  Checkout other projects here

TAFC: Time and Attribute Factors Combined Access Control for Time-Sensitive Data in Public Cloud

Introduction:
The new paradigm of outsourcing data to the cloud is a double-edged sword. On the one hand, it frees data owners from the technical management, and is easier for data owners to share their data with intended users. On the other hand, it poses new challenges on privacy and security protection. To protect data confidentiality against the honest-but-curious cloud service provider, numerous works have been proposed to support fine grained data access control. However, till now, no schemes can support both fine-grained access control and time-sensitive data publishing. In this paper, by embedding timed-release encryption into CP-ABE (Ciphertext-Policy Attribute-based Encryption), we propose a new time and attribute factors combined access control on time-sensitive data for public cloud storage (named TAFC). Based on the proposed scheme, we further propose an efficient approach to design access policies faced with diverse access requirements for time-sensitive data. Extensive security and performance analysis shows that our proposed scheme is highly efficient and satisfies the security requirements for time sensitive data storage in public cloud.

Reference IEEE paper:
“TAFC: Time and Attribute Factors Combined Access Control for Time-Sensitive Data in Public Cloud”, IEEE Transactions on Services Computing, 2017.

Unique ID -SBI1025

DomainCLOUD COMPUTING

Book your project Now.  Checkout other projects here

TEES: An Efficient Search Scheme over Encrypted Data on Mobile Cloud

Introduction:
Cloud storage provides a convenient, massive, and scalable storage at low cost, but data privacy is a major concern that prevents users from storing files on the cloud trustingly. One way of enhancing privacy from data owner point of view is to encrypt the files before outsourcing them onto the cloud and decrypt the files after downloading them. However, data encryption is a heavy overhead for the mobile devices, and data retrieval process incurs a complicated communication between the data user and cloud. Normally with limited bandwidth capacity and limited battery life, these issues introduce heavy overhead to computing and communication as well as a higher power consumption for mobile device users, which makes the encrypted search over mobile cloud very challenging. In this paper, we propose TEES (Traffic and Energy saving Encrypted Search), a bandwidth and energy efficient encrypted search architecture over mobile cloud. The proposed architecture offloads the computation from mobile devices to the cloud, and we further optimize the communication between the mobile clients and the cloud. It is demonstrated that the data privacy does not degrade when the performance enhancement methods are applied. Our experiments show that TEES reduces the computation time by 23% to 46% and save the energy consumption by 35% to 55% per file retrieval, meanwhile the network traffics during the file retrievals are also significantly reduced.

Reference IEEE paper:
“TEES: An Efficient Search Scheme over Encrypted Data on Mobile Cloud”, IEEE Transactions on Cloud Computing, 2017

Unique ID -SBI1026

DomainCLOUD COMPUTING

Book your project Now.  Checkout other projects here

Two Cloud Secure Database for Numeric-Related SQL Range Queries with Privacy Preserving

Introduction:
Industries and individuals outsource database to realize convenient and low-cost applications and services. In order to provide sufficient functionality for SQL queries, many secure database schemes have been proposed. However, such schemes are vulnerable to privacy leakage to cloud server. The main reason is that database is hosted and processed in cloud server, which is beyond the control of data owners. For the numerical range query (“>”, “<”, etc.), those schemes cannot provide sufficient privacy protection against practical challenges, e.g., privacy leakage of statistical properties, access pattern. Furthermore, increased number of queries will inevitably leak more information to the cloud server. In this paper, we propose a two-cloud architecture for secure database, with a series of intersection protocols that provide privacy preservation to various numeric-related range queries. Security analysis shows that privacy of numerical information is strongly protected against cloud providers in our proposed scheme.

Reference IEEE paper:
“Two-Cloud Secure Database for Numeric-Related SQL Range Queries with Privacy Preserving”, IEEE Transactions on Information Forensics and Security, 2017.

Unique ID -SBI1027

DomainCLOUD COMPUTING

Book your project Now.  Checkout other projects here

An Iterative Classification Scheme for Sanitizing Large-Scale
Datasets

Introduction:
Cheap ubiquitous computing enables the collection of massive amounts of personal data in a wide variety of domains. Many organizations aim to share such data while obscuring features that could disclose personally identifiable information. Much of this data exhibits weak structure (e.g., text), such that machine learning approaches have been developed to detect and remove identifiers from it. While learning is never perfect, and relying on such approaches to sanitize data can leak sensitive information, a small risk is often acceptable. Our goal is to balance the value of published data and the risk of an adversary discovering leaked identifiers. We model data sanitization as a game between 1) a publisher who chooses a set of classifiers to apply to data and publishes only instances predicted as non-sensitive and 2) an attacker who combines machine learning and manual inspection to uncover leaked identifying information. We introduce a fast iterative greedy algorithm for the publisher that ensures a low utility for a resource-limited adversary. Moreover, using five text data sets we illustrate that our algorithm leaves virtually no automatically identifiable sensitive instances for a state-of-the-art learning algorithm, while sharing over 93% of the original data, and completes after at most 5 iterations.

Reference IEEE paper:
“An Iterative Classification Scheme for Sanitizing Large-Scale Datasets”, IEEE Transactions on Knowledge and Data Engineering, 2017.

Unique ID -SBI1028

DomainDATA MINING

Book your project Now.  Checkout other projects here

Analyzing Sentiments in One Go: A Supervised Joint Topic
Modeling Approach

Introduction:
In this work, we focus on modeling user-generated review and overall rating pairs, and aim to identify semantic aspects and aspect-level sentiments from review data as well as to predict overall sentiments of reviews. We propose a novel probabilistic supervised joint aspect and sentiment model (SJASM) to deal with the problems in one go under a unified framework. SJASM represents each review document in the form of opinion pairs, and can simultaneously model aspect terms and corresponding opinion words of the review for hidden aspect and sentiment detection. It also leverages sentimental overall ratings, which often comes with online reviews, as supervision data, and can infer the semantic aspects and aspect-level sentiments that are not only meaningful but also predictive of overall sentiments of reviews. Moreover, we also develop efficient inference method for parameter estimation of SJASM based on collapsed Gibbs sampling. We evaluate SJASM extensively on real-world review data, and experimental results demonstrate that the proposed model outperforms seven well-established baseline methods for sentiment analysis tasks.

Reference IEEE paper:
“Analyzing Sentiments in One Go: A Supervised Joint Topic Modeling Approach”, IEEE Transactions on Knowledge and Data Engineering, 2017.

Unique ID -SBI1029

DomainDATA MINING

Book your project Now.  Checkout other projects here

Collaborative Filtering-Based Recommendation of Online Social
Voting

Introduction:
Social voting is an emerging new feature in online social networks. It poses unique challenges and opportunities for recommendation. In this paper, we develop a set of matrix factorization (MF) and nearest-neighbor (NN)-based recommender systems (RSs) that explore user social network and group affiliation information for social voting recommendation. Through experiments with real social voting traces, we demonstrate that social network and group affiliation information can significantly improve the accuracy of popularity-based voting recommendation, and social network information dominates group affiliation information in NN-based approaches. We also observe that social and group information is much more valuable to cold users than to heavy users. In our experiments, simple meta path based NN models outperform computation-intensive MF models in hot-voting recommendation, while users’ interests for non-hot votings can be better mined by MF models. We further propose a hybrid RS, bagging different single approaches to achieve the best top-k hit rate.

Reference IEEE paper:
“Collaborative Filtering-Based Recommendation of Online Social Voting”, IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2017.

Unique ID -SBI1030

DomainDATA MINING

Book your project Now.  Checkout other projects here

Computing Semantic Similarity of Concepts in Knowledge Graphs

Introduction:
This paper presents a method for measuring the semantic similarity between concepts in Knowledge Graphs (KGs) such as WordNet and DBpedia. Previous work on semantic similarity methods have focused on either the structure of the semantic network between concepts (e.g. path length and depth), or only on the Information Content (IC) of concepts. We propose a semantic similarity method, namely wpath, to combine these two approaches, using IC to weight the shortest path length between concepts. Conventional corpus-based IC is computed from the distributions of concepts over textual corpus, which is required to prepare a domain corpus containing annotated concepts and has high computational cost. As instances are already extracted from textual corpus and annotated by concepts in KGs, graph-based IC is proposed to compute IC based on the distributions of concepts over instances. Through experiments performed on well known word similarity datasets, we show that the wpath semantic similarity method has produced statistically significant improvement over other semantic similarity methods. Moreover, in a real category classification evaluation, the wpath method has shown the best performance in terms of accuracy and F score.

Reference IEEE paper :
“Computing Semantic Similarity of Concepts in Knowledge Graphs”, IEEE Transactions on Knowledge and Data Engineering 2017.

Unique ID -SBI1031

DomainDATA MINING

Book your project Now.  Checkout other projects here

Detecting Stress Based on Social Interactions in Social Networks

Introduction:
Psychological stress is threatening people’s health. It is non-trivial to detect stress timely for proactive care. With the popularity of social media, people are used to sharing their daily activities and interacting with friends on social media platforms, making it feasible to leverage online social network data for stress detection. In this paper, we find that users stress state is closely related to that of his/her friends in social media, and we employ a large-scale dataset from real-world social platforms to systematically study the correlation of users’ stress states and social interactions. We first define a set of stress-related textual, visual, and social attributes from various aspects, and then propose a novel hybrid model – a factor graph model combined with Convolutional Neural Network to leverage tweet content and social interaction information for stress detection. Experimental results show that the proposed model can improve the detection performance by 6-9% in F1-score. By further analyzing the social interaction data, we also discover several intriguing phenomena, i.e. the number of social structures of sparse connections (i.e. with no delta connections) of stressed users is around 14% higher than that of non-stressed users, indicating that the social structure of stressed users’ friends tend to be less connected and less complicated than that of non-stressed users.

Reference IEEE paper:
“Detecting Stress Based on Social Interactions in Social Networks”, IEEE Transactions on Knowledge and Data Engineering, 2017.

Unique ID -SBI1032

DomainDATA MINING

Book your project Now.  Checkout other projects here

Dynamic Facet Ordering for Faceted Product Search Engines

Introduction:
Faceted browsing is widely used in Web shops and product comparison sites. In these cases, a fixed ordered list of facets is often employed. This approach suffers from two main issues. First, one needs to invest a significant amount of time to devise an effective list. Second, with a fixed list of facets it can happen that a facet becomes useless if all products that match the query are associated to that particular facet. In this work, we present a framework for dynamic facet ordering in e-commerce. Based on measures for specificity and dispersion of facet values, the fully automated algorithm ranks those properties and facets on top that lead to a quick drill-down for any possible target product. In contrast to existing solutions, the framework addresses e-commerce specific aspects, such as the possibility of multiple clicks, the grouping of facets by their corresponding properties, and the abundance of numeric facets. In a large-scale simulation and user study, our approach was, in general, favorably compared to a facet list created by domain experts, a greedy approach as baseline, and a state-of-the-art entropy-based solution.

Reference IEEE paper:
“Dynamic Facet Ordering for Faceted Product Search Engines”, IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2017

Unique ID -SBI1033

DomainDATA MINING

Book your project Now.  Checkout other projects here

Efficient Clue-based Route Search on Road Networks

Introduction:
With the advances in geo-positioning technologies and location-based services, it is nowadays quite common for road networks to have textual contents on the vertices. Previous work on identifying an optimal route that covers a sequence of query keywords has been studied in recent years. However, in many practical scenarios, an optimal route might not always be desirable. For example, a personalized route query is issued by providing some clues that describe the spatial context between PoIs along the route, where the result can be far from the optimal one. Therefore, in this paper, we investigate the problem of clue-based route search (CRS), which allows a user to provide clues on keywords and spatial relationships. First, we propose a greedy algorithm and a dynamic programming algorithm as baselines. To improve efficiency, we develop a branch-and-bound algorithm that prunes unnecessary vertices in query processing. In order to quickly locate candidate, we propose an AB-tree that stores both the distance and keyword information in tree structure. To further reduce the index size, we construct a PB-tree by utilizing the virtue of 2-hop label index to pinpoint the candidate. Extensive experiments are conducted and verify the superiority of our algorithms and index structures.

Reference IEEE paper:
“Efficient Clue-based Route Search on Road Networks”, IEEE Transactions on Knowledge and Data Engineering, 2017.

Unique ID -SBI1034

DomainDATA MINING

Book your project Now.  Checkout other projects here

Efficient Keyword-aware Representative Travel Route Recommendation

Introduction:
With the popularity of social media (e.g., Facebook and Flicker), users can easily share their check-in records and photos during their trips. In view of the huge number of user historical mobility records in social media, we aim to discover travel experiences to facilitate trip planning. When planning a trip, users always have specific preferences regarding their trips. Instead of restricting users to limited query options such as locations, activities or time periods, we consider arbitrary text descriptions as keywords about personalized requirements. Moreover, a diverse and representative set of recommended travel routes is needed. Prior works have elaborated on mining and ranking existing routes from check-in data. To meet the need for automatic trip organization, we claim that more features of Places of Interest (POIs) should be extracted. Therefore, in this paper, we propose an efficient Keyword-aware Representative Travel Route framework that uses knowledge extraction from users’ historical mobility records and social interactions. Explicitly, we have designed a keyword extraction module to classify the POI-related tags, for effective matching with query keywords. We have further designed a route reconstruction algorithm to construct route candidates that fulfill the requirements. To provide befitting query results, we explore Representative Skyline concepts, that is, the Skyline routes which best describe the trade-offs among different POI features. To evaluate the effectiveness and efficiency of the proposed algorithms, we have conducted extensive experiments on real location-based social network datasets, and the experiment results show that our methods do indeed demonstrate good performance compared to state-of-the-art works.

Reference IEEE paper:
“Efficient Keyword-aware Representative Travel Route Recommendation”, IEEE Transactions on Knowledge and Data Engineering, 2017.

Unique ID -SBI1035

DomainDATA MINING

Book your project Now.  Checkout other projects here

Energy efficient query processing in Web Search Engines

Introduction:
Web search engines are composed by thousands of query processing nodes, i.e., servers dedicated to process user queries. Such many servers consume a significant amount of energy, mostly accountable to their CPUs, but they are necessary to ensure low latencies, since users expect sub-second response times (e.g., 500 ms). However, users can hardly notice response times that are faster than their expectations. Hence, we propose the Predictive Energy Saving Online Scheduling Algorithm (PESOS) to select the most appropriate CPU frequency to process a query on a per-core basis. PESOS aims at process queries by their deadlines, and leverage high-level scheduling information to reduce the CPU energy consumption of a query processing node. PESOS bases its decision on query efficiency predictors, estimating the processing volume and processing time of a query. We experimentally evaluate PESOS upon the TREC ClueWeb09B collection and the MSN2006 query log. Results show that PESOS can reduce the CPU energy consumption of a query processing node up to _48% compared to a system running at maximum CPU core frequency. PESOS outperforms also the best state-of-the-art competitor with a _20% energy saving, while the competitor requires a fine parameter tuning and it may incurs in uncontrollable latency violations.

Reference IEEE paper :
“Energy-efficient Query Processing in Web Search Engines”, IEEE Transactions on Knowledge and Data Engineering, 2017

Unique ID -SBI1036

DomainDATA MINING

Book your project Now.  Checkout other projects here

Generating Query Facets using Knowledge Bases

Introduction:
A query facet is a significant list of information nuggets that explains an underlying aspect of a query. Existing algorithms mine facets of a query by extracting frequent lists contained in top search results. The coverage of facets and facet items mined by this kind of methods might be limited, because only a small number of search results are used. In order to solve this problem, we propose mining query facets by using knowledge bases which contain high-quality structured data. Specifically, we first generate facets based on the properties of the entities which are contained in Freebase and correspond to the query. Second, we mine initial query facets from search results, then expanding them by finding similar entities from Freebase. Experimental results show that our proposed method can significantly improve the coverage of facet items over the state-of-the-art algorithms.

Reference IEEE paper:
“Generating Query Facets using Knowledge Bases”, IEEE Transactions on Knowledge and Data Engineering 2017.

Unique ID -SBI1037

DomainDATA MINING

Book your project Now.  Checkout other projects here

Influential Node Tracking on Dynamic Social Network: An Interchange Greedy Approach

Introduction:
As both social network structure and strength of influence between individuals evolve constantly, it requires to track the influential nodes under a dynamic setting. To address this problem, we explore the Influential Node Tracking (INT) problem as an extension to the traditional Influence Maximization problem (IM) under dynamic social networks. While Influence Maximization problem aims at identifying a set of k nodes to maximize the joint influence under one static network, INT problem focuses on tracking a set of influential nodes that keeps maximizing the influence as the network evolves. Utilizing the smoothness of the evolution of the network structure, we propose an efficient algorithm, Upper Bound Interchange Greedy (UBI) and a variant, UBI+. Instead of constructing the seed set from the ground, we start from the influential seed set we find previously and implement node replacement to improve the influence coverage. Furthermore, by using a fast update method by calculating the marginal gain of nodes, our algorithm can scale to dynamic social networks with millions of nodes. Empirical experiments on three real large-scale dynamic social networks show that our UBI and its variants, UBI+ achieves better performance in terms of both influence coverage and running time

Reference IEEE paper:
“Influential Node Tracking on Dynamic Social Network: An Interchange Greedy Approach”, IEEE Transactions on Knowledge and Data Engineering, 2017.

Unique ID -SBI1038

DomainDATA MINING

Book your project Now.  Checkout other projects here

l-Injection: Toward Effective Collaborative Filtering Using Uninteresting Items

Introduction:
We develop a novel framework, named as l-injection, to address the sparsity problem of recommender systems. By carefully injecting low values to a selected set of unrated user-item pairs in a user-item matrix, we demonstrate that top-N recommendation accuracies of various collaborative filtering (CF) techniques can be significantly and consistently improved. We first adopt the notion of pre-use preferences of users toward a vast amount of unrated items. Using this notion, we identify uninteresting items that have not been rated yet but are likely to receive low ratings from users, and selectively impute them as low values. As our proposed approach is method-agnostic, it can be easily applied to a variety of CF algorithms. Through comprehensive experiments with three real-life datasets (e.g., Movielens, Ciao, and Watcha), we demonstrate that our solution consistently and universally enhances the accuracies of existing CF algorithms (e.g., item-based CF, SVD-based CF, and SVD++) by 2.5 to 5 times on average. Furthermore, our solution improves the running time of those CF methods by 1.2 to 2.3 times when its setting produces the best accuracy.

Reference IEEE paper:
“l-Injection: Toward Effective Collaborative Filtering Using Uninteresting Items”, IEEE Transactions on Knowledge and Data Engineering, 2017.

Unique ID -SBI1039

DomainDATA MINING

Book your project Now.  Checkout other projects here

Mining Competitors from Large Unstructured Datasets

Introduction:
In any competitive business, success is based on the ability to make an item more appealing to customers than the competition. A number of questions arise in the context of this task: how do we formalize and quantify the competitiveness between two items? Who are the main competitors of a given item? What are the features of an item that most affect its competitiveness? Despite the impact and relevance of this problem to many domains, only a limited amount of work has been devoted toward an effective solution. In this paper, we present a formal definition of the competitiveness between two items, based on the market segments that they can both cover. Our evaluation of competitiveness utilizes customer reviews, an abundant source of information that is available in a wide range of domains. We present efficient methods for evaluating competitiveness in large review datasets and address the natural problem of finding the top-k competitors of a given item. Finally, we evaluate the quality of our results and the scalability of our approach using multiple datasets from different domains.

Reference IEEE paper:

“Mining Competitors from Large Unstructured Datasets”, IEEE Transactions on Knowledge and Data Engineering, 2017.

Unique ID -SBI1040

DomainDATA MINING

Book your project Now.  Checkout other projects here

Modeling Information Diffusion over Social Networks for Temporal Dynamic Prediction

Introduction:
How to model the process of information diffusion in social networks is a critical research task. Although numerous attempts have been made for this study, few of them can simulate and predict the temporal dynamics of the diffusion process. To address this problem, we propose a novel information diffusion model (GT model), which considers the users in network as intelligent agents. The agent jointly considers all his interacting neighbours and calculates the payoffs for his different choices to make strategic decision. We introduce the time factor into the user payoff, enabling the GT model to not only predict the behaviour of a user but also to predict when he will perform the behaviour. Both the global influence and social influence are explored in the time dependent payoff calculation, where a new social influence representation method is designed to fully capture the temporal dynamic properties of social influence between users. Experimental results on Sina Weibo and Flickr validate the effectiveness of our methods.

Reference IEEE paper:
“Modeling Information Diffusion over Social Networks for Temporal Dynamic Prediction”, IEEE Transactions on Knowledge and Data Engineering, 2017.

Unique ID -SBI1041

DomainDATA MINING

Book your project Now.  Checkout other projects here

Provably Secure Key-Aggregate Cryptosystems with Broadcast
Aggregate Keys for Online Data Sharing on the Cloud

Introduction:
Online data sharing for increased productivity and efficiency is one of the primary requirements today for any organization. The advent of cloud computing has pushed the limits of sharing across geographical boundaries, and has enabled a multitude of users to contribute and collaborate on shared data. However, protecting online data is critical to the success of the cloud, which leads to the requirement of efficient and secure cryptographic schemes for the same. Data owners would ideally want to store their data/files online in an encrypted manner, and delegate decryption rights for some of these to users, while retaining the power to revoke access at any point of time. An efficient solution in this regard would be one that allows users to decrypt multiple classes of data using a single key of constant size that can be efficiently broadcast to multiple users. Chu et al. proposed a key aggregate cryptosystem (KAC) in 2014 to address this problem, albeit without formal proofs of security. In this paper, we propose CPA and CCA secure KAC constructions that are efficiently implementable using elliptic curves and are suitable for implementation on cloud based data sharing environments. We lay special focus on how the standalone KAC scheme can be efficiently combined with broadcast encryption to cater to m data users and m0 data owners while reducing the reducing the secure channel requirement from O(mm0) in the standalone case to O(m + m0).

Reference IEEE paper:
“Provably Secure Key-Aggregate Cryptosystems with Broadcast Aggregate Keys for Online Data Sharing on the Cloud”, IEEE Transactions on Computers, 2017.

Unique ID -SBI1021

DomainCLOUD COMPUTING

Book your project Now.  Checkout other projects here

Achieving Efficient and Secure Data Acquisition for Cloud supported
Internet of Things in Smart Grid

Introduction:
Cloud-supported Internet of Things (Cloud-IoT) has been broadly deployed in smart grid systems. The IoT front-ends are responsible for data acquisition and status supervision, while the substantial amount of data is stored and managed in the cloud server. Achieving data security and system efficiency in the data acquisition and transmission process are of great significance and challenging, because the power grid-related data is sensitive and in huge amount. In this paper, we present an efficient and secure data acquisition scheme based on CP-ABE (Ciphertext Policy Attribute Based Encryption). Data acquired from the terminals will be partitioned into blocks and encrypted with its corresponding access sub-tree in sequence, thereby the data encryption and data transmission can be processed in parallel. Furthermore, we protect the information about the access tree with threshold secret sharing method, which can preserve the data privacy and integrity from users with the unauthorized sets of attributes. The formal analysis demonstrates that the proposed scheme can fulfill the security requirements of the Cloud-supported IoT in smart grid. The numerical analysis and experimental results indicate that our scheme can effectively reduce the time cost compared with other popular approaches.

Reference IEEE paper:
“Achieving Efficient and Secure Data Acquisition for Cloud-supported Internet of Things in Smart Grid” , IEEE Internet of Things Journal, IEEE 2017.

Unique ID -SBI1007

DomainCLOUD COMPUTING

Book your project Now.  Checkout other projects here

Achieving secure universal and fine-grained query results
verification for secure search scheme over encrypted cloud data

Introduction:
Secure search techniques over encrypted cloud data allow an authorized user to query data files of interest by submitting encrypted query keywords to the cloud server in a privacy-preserving manner. However, in practice, the returned query results may be incorrect or incomplete in the dishonest cloud environment. For example, the cloud server may intentionally omit some qualified results to save computational resources and communication overhead. Thus, a well-functioning secure query system should provide a query results verification mechanism that allows the data user to verify results. In this paper, we design a secure, easily integrated, and fine-grained query results verification mechanism, by which, given an encrypted query results set, the query user not only can verify the correctness of each data file in the set but also can further check how many or which qualified data files are not returned if the set is incomplete before decryption. The verification scheme is loose-coupling to concrete secure search techniques and can be very easily integrated into any secure query scheme. We achieve the goal by constructing secure verification object for encrypted cloud data. Furthermore, a short signature technique with extremely small storage cost is proposed to guarantee the authenticity of verification object and a verification object request technique is presented to allow the query user to securely obtain the desired verification object. Performance evaluation shows that the proposed schemes are practical and efficient.

Reference IEEE paper:
“Achieving secure, universal, and fine-grained query results verification for secure search scheme over encrypted cloud data” IEEE Transactions on Cloud Computing, 2017.

Unique ID -SBI1008

DomainCLOUD COMPUTING

Book your project Now.  Checkout other projects here

Assessing Invariant Mining Techniques for Cloud-based Utility
Computing Systems

Introduction:
Likely system invariants model properties that hold in operating conditions of a computing system. Invariants may be mined offline from training datasets, or inferred during execution. Scientific work has shown that invariants’ mining techniques support several activities, including capacity planning and detection of failures, anomalies and violations of Service Level Agreements. However their practical application by operation engineers is still a challenge. We aim to fill this gap through an empirical analysis of three major techniques for mining invariants in cloud-based utility computing systems: clustering, association rules, and decision list. The experiments use independent datasets from real-world systems: a Google cluster, whose traces are publicly available, and a Software-as-a-Service platform used by various companies worldwide. We assess the techniques in two invariants’ applications, namely executions characterization and anomaly detection, using the metrics of coverage, recall and precision. A sensitivity analysis is performed. Experimental results allow inferring practical usage implications, showing that relatively few invariants characterize the majority of operating conditions, that precision and recall may drop significantly when trying to achieve a large coverage, and that techniques exhibit similar precision, though the supervised one a higher recall. Finally, we propose a general heuristic for selecting likely invariants from a dataset.

Reference IEEE paper:
“Assessing Invariant Mining Techniques for Cloud-based Utility Computing Systems”, IEEE Transactions on Services Computing 2017.

Unique ID -SBI1009

DomainCLOUD COMPUTING

Book your project Now.  Checkout other projects here

Customer-Satisfaction-Aware Optimal Multiserver Configuration
for Profit Maximization in Cloud Computing

Introduction:
Along with the development of cloud computing, an increasing number of enterprises start to adopt cloud service, which promotes the emergence of many cloud service providers. For cloud service providers, how to configure their cloud service platforms to obtain the maximum profit becomes increasingly the focus that they pay attention to. In this paper, we take customer satisfaction into consideration to address this problem. Customer satisfaction affects the profit of cloud service providers in two ways. On one hand, the cloud configuration affects the quality of service which is an important factor affecting customer satisfaction. On the other hand, the customer satisfaction affects the request arrival rate of a cloud service provider. However, few existing works take customer satisfaction into consideration in solving profit maximization problem, or the existing works considering customer satisfaction do not give a proper formalized definition for it. Hence, we firstly refer to the definition of customer satisfaction in economics and develop a formula for measuring customer satisfaction in cloud computing. And then, an analysis is given in detail on how the customer satisfaction affects the profit. Lastly, taking into consideration customer satisfaction, service-level agreement, renting price, energy consumption and so forth, a profit maximization problem is formulated and solved to get the optimal configuration such that the profit is maximized.

Reference IEEE paper:
“Customer-Satisfaction-Aware Optimal Multiserver Configuration for Profit Maximization in Cloud Computing”, IEEE Transactions on Sustainable Computing, 2017.

Unique ID -SBI1010

DomainCLOUD COMPUTING

Book your project Now.  Checkout other projects here

Identity Based Data Outsourcing with Comprehensive Auditing in
Clouds

Introduction:
Cloud storage system provides facilitative file storage and sharing services for distributed clients. To address integrity, controllable outsourcing and origin auditing concerns on outsourced files, we propose an identity-based data outsourcing (IBDO) scheme equipped with desirable features advantageous over existing proposals in securing outsourced data. First, our IBDO scheme allows a user to authorize dedicated proxies to upload data to the cloud storage server on her behalf, e.g., a company may authorize some employees to upload files to the company’s cloud account in a controlled way. The proxies are identified and authorized with their recognizable identities, which eliminates complicated certificate management in usual secure distributed computing systems. Second, our IBDO scheme facilitates comprehensive auditing, i.e., our scheme not only permits regular integrity auditing as in existing schemes for securing outsourced data, but also allows to audit the information on data origin, type and consistence of outsourced files. Security analysis and experimental evaluation indicate that our IBDO scheme provides strong security with desirable efficiency.

Reference IEEE paper:
“Identity-Based Data Outsourcing with Comprehensive Auditing in Clouds”, IEEE Transactions on Information Forensics and Security, 2017.

Unique ID -SBI1013

DomainCLOUD COMPUTING

Book your project Now.  Checkout other projects here