Graphia Pro:Input Formats

From Kajeka Wiki
Jump to: navigation, search
General Layout for Graphia Pro compatible .csv files

Graphia Pro supports the input of data in a wide variety of data types and file formats, some are simple white space separated text files, others more complex .xml based formats.

If your data does not load but the example files do, the most likely explanation is that the input format is incorrect (see below). The program will give you an error where possible indicating the nature of the problem.  Below is an explanation of each data input format supported together with example files.

Unweighted simple pairwise (.txt, .tgf)

An example (white space seperated) graph file

This will create a simple directional network where each line of the input file defines two nodes that are connected to each other by an edge. The node defined in column one will be the source of the edge and the node in column 2 the recipient. If there is no directionality to the relationships the arrowhead (only shown 2D mode just needs to be turned off in the ‘General tab’). Singleton nodes can be added by adding a line where a node connects to itself. The basic format itself is shown below and can be saved in this format from a text editor, Excel or the like.

For the link below please use right mouse click and ‘Save Target As…’ or ‘Save Link As…’ in IE/Mozilla Firefox

London Tube Map (Simple Pairwise .txt) Download

An example (white space seperated) weighted graph file

Weighted simple pairwise (.txt, .tgf)

This is an extension of the simple pairwise format which also adds a weight to each edge. An edge weight is used when all edges are not equal e.g. some are of a higher confidence than others. Edge weights may be used for filtering, visualized in terms of colour or edge thickness. The format is a one-column extension to the previous format, adding a single numeric weight to the definition of a pairwise relationship. Weights should normally be in linear ranges, in whatever scale is appropriate.

For the link below please use right mouse click and ‘Save Target As…’ or ‘Save Link As…’ in IE/Mozilla Firefox

Numerical Data Files (.csv file extension)

This is one of the most useful data input formats for Graphia Pro. It allows any table of numerical data to be converted into a network based on correlation (similarity), allowing entities represented by the columns and rows to be compared. It is a simple format but files need careful construction if you are to get the most from an analysis.

General Layout for Graphia Pro compatible .csv files

The general assumption is that you have a numerical matrix arranged in columns and rows. There are a number of entities (people, devices, companies, genes, etc.) listed in rows, which have values associated with them over a range of conditions, measured variables, samples or time-points arranged in columns. Each entity is arranged on consecutive rows in the first column which should list a unique identifier for a given entity. This will be the name of the node. For this reason, and because it is this column that will be searched when searching for a node within the application, it is advisable to use a memorable identifier. Other information about the entities (metadata), if available, can be added in subsequent columns as ‘class sets’. A class set is a series of terms that describe the entity or properties of it. This information can be used to help mine the data within the application. Any number of class sets can be added in with the data.

Following this are columns of numerical data. They may represent the amount (value) of an entity over time, over different conditions/samples, numerical answers to a questionaire, or different properties of the entity e.g. different nutrients in a food. Each column must have a header and it is useful if the name of the column is short, informative and unique. Columns should also be arranged in a logical order e.g. according to time or with measurements from similar samples being grouped together. As with entities, a column can be associated with other information. For instance if a column were to represent measurements taken from a clinical sample, we might know of the donor their age, race, gender, diet, drug treatment etc. This information can be added below the header row, each row of metadata representing a different property (class set) and the name of the column class set place in the first column. The data itself can be discrete e.g. 1 or 0 or continuous values positive or negative and arrange in a matrix of columns and rows. The software does not know or care where your numbers come from, it is going to treat them the same regardless. There must be no missing values, text characters or other strange stuff that some people manage to include within their input data.

Files can be prepared in a program such as Excel and saved as a comma separated variable (.csv) file. See example data. For the link below please use right mouse click and ‘Save Target As…’ or ‘Save Link As…’ in IE/Mozilla Firefox

Layout Files (.layout file extension)

Layout files are a file format that can be read and saved by Graphia Pro. They can be used to describe a network's characteristics, A layout file will preserve all the information from the saved network such that when reloaded it is an exact replica of the original graph. The basic format consists of the definition of the node-edge relationships followed by information pertaining to the visual and positional specification of nodes. It will also store associations with data files. In essence it is a text file and as such they can be generated outside of the application and used to define a graph prior to loading into the tool.

London Tube Graph (.layout) Download

Description of the .matrix format used by Graphia Pro

Matrix files (.matrix file extension)

Graphia Pro will generate a correlation matrix from any set of numbers (saved as a .csv file) using the Pearson or Spearman-rank algorithms. Other similarity measures may be required when comparing other types of entities e.g. protein or DNA sequences. A similarity matrix file using any algorithm may be calculated external to the tool then loaded into Graphia Pro. On opening a .matrix file a Matrix CutOff dialog will appear requesting the user to define the threshold above which relationships will be plotted. Once selected all relationships above that threshold will be displayed.

Correlation file of GNF data. Overlap graph of splice variant

Graph Modelling Language Files (.gml file extension)

GML (Graph Modelling Language) is a hierarchical ASCII-based file format for describing graphs and used as a standard file exchange format for many network analysis tools. The format is widely supported by graphing software. It defines features such as nodes and edges that are used for graph drawing.

Cylinder Graph (.gml) Download

Flower Graph (.gml) Download

GraphML files (.graphml file extension)

GraphML is a comprehensive file format for graphs. It has a .xml language core to describe the structural properties of a graph and a flexible extension mechanism to add application-specific data. It is used by a number of network editing programs. Not all graphml formats are the same. We support that generated the yEd network editing tool. This application has been used extensively from the authors for the editing and layout of networks and biological pathways. Any graph drawn using this package and saved as .graphml file can now be loaded directly into Graphia Pro.

IFNB Pathway (.GraphML) Download

Abstract

Web Ontology Language (.owl file extension)

The Web Ontology Language (OWL) is a family of knowledge representation languages for authoring ontologies. Ontologies are a formal way to describe taxonomies and classification networks, essentially defining the structure of knowledge for various domains: the nouns representing classes of objects and the verbs representing relations between the objects.

Influenza A Virus network (.owl) Download