Research Interests

Introduction

As a PhD student, I am investigating the problem of inducing Stochastic Graph Grammars, and of exploring the utility of such grammatical models. I am pursuing research at the
CORAL Lab at UMBC, under the supervision of Dr. Tim Oates . Here, I will briefly explain what graph grammars are, and why they are important (and interesting!).

Graph Grammars

A grammar is a set of rules that describes a language. A language is a collection of sentences; a sentence, in turn, is a linear sequence of words. A graph grammar, on the other hand, describes languages that are collections of graphs, instead of linear sentences. In a few minutes, I will describe why such grammars are useful; however, first we need a few more definitions.

Stochastic Graph Grammars

A grammar tells us whether a given sentence belongs to a language or not. A stochastic grammar, additionally, gives us the probability that the sentence was generated by the grammar. Similarly, a stochastic graph grammar tells us how probable it is for a given graph to have been generated by the grammar. A stochastic graph grammar is, thus, a model for a probability distribution of graphs. Such models are typically useful in scenarios where the graphs have a hierarchical structure. Non-terminal symbols represent subgraphs that behave as a unit; such a subgraph can itself be a collection of smaller subgraphs, that are units in their own right. The hierarchical and recursive nature of grammars makes them ideal as models for such distributions. As the next section shows, there are several real life scenarios where graph grammars provide a feasible tool for probabilistic modeling.

Applications

We now consider the following domains in which stochastic graph grammars are useful.
  • Understanding the Structure of Large Chemical Compounds

    Although the structure of many organic compounds, such as proteins, may be huge, such structures tend to be heirarchical in nature. A molecule is composed of a collection of functional groups , which are groups of atoms that behave as a single unit. A functional group, in turn, may be composed of smaller functional groups. Finally, it is often the case that functional groups can be partitioned into classes, where all members of the same class can be represented by the same formula. A simple example is CnH2n+1 that represents all alkyl groups. These properties clearly indicate that organic molecules are amenable to grammatical representations.

    How are such models useful? Once a grammar has been learned for a family of such molecules, we can generate samples from the underlying distribution of those molecules, assisting in compound design. Besides, given a molecule, we can calculate the probability that it was generated by this distribution. Intuitively, more stable molecules are more probable. Thus, such analysis can give us insight into the stability of the compound.

  • Understanding and Classifying Social Networks

    Social networks can be represented as graphs, where nodes represent individuals, and edges represent acquaintance. Like organic molecules, social networks, too, have a hierarchical, recursive structure. A network may be composed of several communities; a community itself maybe a collection of several smaller communities. If a family of social networks can be modeled using a stochastic graph grammar, then, given a new network, we can tell how likely it is that the network belongs to the same family. This can be useful in classifying networks, and, in partcular, identifying malicious ones.