The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
A parallel algorithm for low-rank tensor decomposition that is especially well-suited for big tensors is proposed. The new algorithm is based on parallel processing of a set of randomly compressed, reduced-size ‘replicas’ of the big tensor. Each replica is independently decomposed, and the results are joined via a master linear equation per tensor mode. The approach enables massive parallelism with...
Given a large image set, in which very few images have labels, how to guess labels for the remaining majority? How to spot images that need brand new labels different from the predefined ones? How to summarize these data to route the user’s attention to what really matters? Here we answer all these questions. Specifically, we propose QuMinS, a fast, scalable solution to two problems: (i) Low-labor...
How do we find patterns and anomalies, on graphs with billions of nodes and edges, which do not fit in memory? How to use parallelism for such terabyte-scale graphs? In this work, we focus on inference, which often corresponds, intuitively, to “guilt by association” scenarios. For example, if a person is a drug-abuser, probably its friends are so, too; if a node in a social network is of male gender,...
JavaScript code is often obfuscated; given such code, can we tell whether if it is malicious or benign? We propose Obfuscating Causal Relations Finding (OCRF), which addresses this problem. The contributions are the following: (1) careful feature extraction, using domain knowledge (2) no need for de-obfuscation, since our method can be applied to the obfuscated script directly, (3) combined obfuscation...
How do connected components evolve? What are the regularities that govern the dynamic growth process and the static snapshot of the connected components? In this work, we study patterns in connected components of large, real-world graphs. First, we study one of the largest static Web graphs with billions of nodes and edges and analyze the regularities among the connected components using GFD(Graph...
What do graphs look like? How do they evolve over time? How to handle a graph with a billion nodes? We present a comprehensive list of static and temporal laws, and some recent observations on real graphs (e.g., "eigenSpokes"). For generators, we describe some recent ones, which naturally match all of the known properties of real graphs. Finally, for tools, we present "oddball"...
In a large weighted graph, how can we detect suspicious sub graphs, patterns, and outliers? A suspicious pattern could be a near-clique or a set of nodes bridging two or more near-cliques. This would improve intrusion detection in computer networks and network traffic monitoring. Are there other network patterns that need to be detected? We propose EigenDiagnostics, a fast algorithm that spots such...
Given a user in a social network, which new friends should we recommend, the dual goal being to achieve user satisfaction and good network connectivity? Similarly, which new products are better to recommend to satisfy customers' taste/needs as well as increase vendor profit? Typical recommender systems use merely past purchases, product ratings, demographic meta-data, and network `proximity' to make...
We propose a graphical signature for intrusion detection given alert sequences. By correlating alerts with their temporal proximity, we build a probabilistic graph-based model to describe a group of alerts that form an attack or normal behavior. Using the models, we design a pairwise measure based on manifold learning to measure the dissimilarities between different groups of alerts. A large dissimilarity...
We report a surprising, persistent pattern in an important class of large sparse social graphs, which we term eigenspokes. We focus on large mobile call graphs, spanning hundreds of thousands of nodes and edges, and find that the singular vectors of these graphs exhibit a striking eigenspokes pattern wherein, when plotted against each other, they have clear, separate lines that often neatly align...
In this paper, we describe PEGASUS, an open source peta graph mining library which performs typical graph mining tasks such as computing the diameter of the graph, computing the radius of each node and finding the connected components. as the size of graphs reaches several giga-, tera- or peta-bytes, the necessity for such a library grows too. To the best of our knowledge, PEGASUS is the first such...
Triangle counting is an important problem in graph mining. The clustering coefficient and the transitivity ratio,two commonly used measures effectively quantify the triangle density in order to quantify the fact that friends of friends tend to be friends themselves. Furthermore, several successful graph mining applications rely on the number of triangles. In this paper, we study the problem of counting...
We present Graphite, a system that allows the user to visually construct a query pattern, finds both its exact and approximate matching subgraphs in large attributed graphs, and visualizes the matches. For example, in a social network where a person's occupation is an attribute, the user can draw a 'star' query for "finding a CEO who has interacted with a Secretary, a Manager, and an Accountant,...
How do real, weighted graphs change over time? What patterns, if any, do they obey? Earlier studies focus on unweighted graphs, and, with few exceptions, they focus on static snapshots. Here, we report patterns we discover on several real, weighted, time-evolving graphs. The reported patterns can help in detecting anomalies in natural graphs, in making link prediction and in providing more criteria...
Vector data are normally used for probabilistic graphical models with Bayesian inference. However, tensor data, i.e., multidimensional arrays, are actually natural representations of a large amount of real data, in data mining, computer vision, and many other applications. Aiming at breaking the huge gap between vectors and tensors in conventional statistical tasks, e.g., automatic model selection,...
We propose an approach for learning visual models of object categories in an unsupervised manner in which we first build a large-scale complex network which captures the interactions of all unit visual features across the entire training set and we infer information, such as which features are in which categories, directly from the graph by using link analysis techniques. The link analysis techniques...
Given publication titles and authors, what can we say about the evolution of scientific topics and communities over time? Which communities shrunk, which emerged, and which split, over time? And, when in time were the turning points? We propose TimeFall, which can automatically answer these questions given a social network/graph that evolves over time. The main novelty of the proposed approach is...
Similarity joins have attracted significant interest, with applications in geographical information systems, astronomy, marketing analyzes, and anomaly detection. However, all the past algorithms, although highly fine-tuned, suffer an output explosion if the query range is even moderately large relative to the local data density. Under such circumstances, the response time and the search effort are...
Online auctions have revolutionized the ability of people to buy and sell items without middlemen, and sales reaching more than $57 billion every year on eBay alone. The user interactions at online auctions form a network of interactions akin to a social network. Unlike other online social networks, online auction networks have not been studied so far. In this paper, we model and characterize the...
Fraud detection has become a common concern of the online auction Web sites. Fraudsters often manipulate reputation systems and commit nondelivery fraud. To deal with fraud in group behavior we consider network level features, such as users' beliefs of other users. In this paper we use the loopy belief propagation algorithm and apply it to network level fraud detection, classifying fraudsters, accomplices,...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.