Purpose of SyMBA

Systems and other integrative approaches to biology generate a wide variety of genome-scale and high-throughput data that must be annotated and reliably stored. Novel data repositories are required that not only allow the storage of multiple datasets of differing types, but also capture the metadata required to record the experimental details and provenance in a standard format that is amenable to data sharing. The Functional Genomics Experiment Mark-Up Language (FuGE-ML) and associated Object Model (FuGE-OM) were created to facilitate the development and uptake of metadata standards. FuGE developers have also created a number of base Software ToolKits (STKs) that are good starting-points for further development of FuGE-based tools. Systems and Molecular Biology Data and Metadata Archive (SyMBA) is based on FuGE Version 1. SyMBA is based on the FuGE XSD and archives, stores, and retrieves raw high-throughput data. SyMBA integrates both multiple 'omics' data types and information about experiments in a single database.

SyMBA is a flexible data repository specifically designed to address the requirements for primary data storage for projects requiring the storage of many different data types. The metadata database stores all information using the FuGE XSD and hyperjaxb3 to create a FuGE-structured database, and as such can integrate all metadata using the same structure, irrespective of associated data type. It is one of the first database implementations of Version 1 of the FuGE-OM. SyMBA features a metadata database, general and expert user interfaces, and utilizes a Life Science Identifier resolution and assigning service to uniquely identify objects and provide programmatic access to the database.

FuGE, an emerging data standard, provides fast development time in the short term, and an export format that is easy to extend and share in the longer term. Our implementation of FuGE also acts as an example for others wishing to apply FuGE for their own purposes. We encourage other groups with similar needs to install, evaluate, and contribute to its development. The most up-to-date version of this documentation can be found in the SyMBA SourceForge Subversion repository. The system - and its documentation - is available for download and installation via the SourceForge project site ( ).

Relation to the FuGE Toolkit Projects

Some sections of the documentation has its origins in the FuGE Hibernate STK documentation , which in turn was based on a number of sources, including the Milestone 3 documentation of SyMBA itself.

SyMBA utilizes Version 1.0 of the FuGE Standard. It provides a database and persistence layer based on the FuGE XSD, together with other helper classes. These are ideal for community developers wishing to store their FuGE-related data in a more structured way than XML files provide, and as a foundation to build FuGE tools or a FuGE-based system.

SyMBA is built with Apache Maven 2, a software project management and build system similar to, but more comprehensive than other build tools such as Apache Ant or GNU Make.

SyMBA provides:

  • a FuGE-structured relational database

  • a object-relational persistence and query layer

  • a set of Java classes representing FuGE XML entities and which also connect to the database via hyperjaxb3

  • unit tests of all queries and version retrieval

  • a user-friendly web interface that interacts with the FuGE database and the remote file store for storing raw data outputted from experimental assays

When you check-out the symba subversion repository, you will get a Maven2 project. Full instructions for compiling this project and generating and accessing the FuGE database are included in this documentation.