Software web desarrollado por VIctor Miguel Terrón Macias en conjunto a diferentes autores respectivamente citados en cada módulo.
Previous steps
Before running the software, make sure you meet the prerequisites and run the following commands:
You receive the compressed folder with the complete code, therefore it is necessary that you have downloaded.
Python 3.11.
7-Zip yo unzip.
Unzip the folder.
With the console of your OS. Navigate to the path of the folder where you have downloaded, as seen in the images:
Windows.
Ruta del archivo donde está descargado Bio-Suite-T.
MacOS.
Ruta del archivo donde está descargado Bio-Suite-T.
Once inside the folder, write the following commands to activate the virtual environment:
Windows:
cdenv
cd Scripts
activate
cd..
cd..
MacOS
source env/scripts/activate
It is required to install all the necessary dependencies for the project to work, therefore run the following command:
pip install-r requirements.txt
On MacOS, difficulties have been experienced in the installation process associated with incompatibilities with certain packages. If this happens, install each package that displays the error separately using the following syntax:
pip install"nombre_paquete"
If the problem persists, try to skip the installation of said package by deleting it from the requirements.txt file. If problems still persist at this stage, you can contact me directly at the following email: victor.terron@cimat.mx
After installing the packages you can run the program using the following command:
python .\manage.py runserver
The software should show you the following output on the console:
Console output.
After executing the commands you are ready to run Bio-Suite-T.
127.0.0.1:8000
After executing the commands you are ready to run Bio-Suite-T.
Software Overview
Bio-Suite-T logo.
This tool is a web application planned to bring together a series of tools for the analysis of proteomic and genomic sequences, focused mainly on Algal and Fungi, however, it is not limited only to these, but is capable of carrying out the analysis of different sequences, it has 16 tools among which are:
Analysis of properties of DNA or nucleotide sequences.
Analysis of protein sequence properties.
Sequence transcription.
Reverse transcription of sequences.
Sequence translation.
Sequence alignment.
BLAST.
PDB file viewer.
PDB file analyzer.
Calculation and generation of phylogenetic trees.
MOTIFS sequence analysis.
Search for conserved regions at the genomic and proteomic level of Algal and Fungi using regular expressions.
Fungi Protein Database Information Viewer. (you need to have MongoDB, MongoAtlas installed and load the databases available in JGI or NCBI, among others).
Fungi genome database information viewer. (you need to have MongoDB, MongoAtlas installed and load the databases available in JGI or NCBI, among others).
Algal protein database information viewer. (you need to have MongoDB, MongoAtlas installed and load the databases available in JGI or NCBI, among others).
Algal genome database information viewer. (you need to have MongoDB, MongoAtlas installed and load the databases available in JGI or NCBI, among others).
This version uses DJANGO as a front and back development framework in conjunction with JS.
Requirements to run the software (Recommended)
The software has been developed and tested on a computer with the following characteristics:
8 GB RAM
Cent OS 7, Windows 10 y MAC OS as operating systems.
Note: MAC OS has incompatibilities with the BLAST module, therefore you may not be able to run this module from this operating system.
Proccessor AMD Ryzen 5 5600H.
Sobre las características de software:
Python 3.11.
MongoDBCompass.
Chrome como navegador recomendado.
El proyecto contiene todos los elementos necesarios para desplegarse en un servidor si así se requiriera, para iniciar la aplicación se debe ejecutar el siguiente comando.
Genomic and proteomic data from the non-relational database MongoDB
The information on the genomes and proteomes is necessary so that you can execute the database modules, it belongs to each respective author, however, it has been downloaded from some species, therefore you must generate a non-relational database of documents in MONGODB call fungiRegExAlgal and then import the files available at the following link:
As mentioned, this software can deploy this application both locally and on a server if necessary, this is up to the user of the application and their needs depending on the available computational resources.
Take into consideration that if multiple users connect at the same time, they will interrupt each other's tasks, therefore the application is for use by one user at a time, if the other user queues tasks to the application while another user is executing others Priority will be given to those of the last user and those of the previous user will be eliminated.
Local Server/Local Computer/External Server
In case you deploy the application using a local server (which can be another computer). Follow the following instructions:
For other distributions look for information on the official site
Browse the file system until you find the root folder containing the project.
Run the application with its respective command.
python .\manage.py runserver
Own computer
If you are running it locally on your computer, the IP does not change, so you must enter http://127.0.0.1:8000/. The configuration of the firewall depends on each user and for this application, if it were executed locally, it is not necessary to open ports to external users.
Features
Function 1. Analysis of DNA properties
In this functionality, you must enter a nucleotide sequence as input. From this, the tool will generate and show you a series of relevant DNA properties of the entered sequence.
Among these properties, you will find:
Complement of the sequence.
Reverse complementary sequence.
In its representations in decimal and percentage format.
Distribution of amino acids present in the sequence.
A visual and informative pie chart illustrating the distribution of (A) adenine, (C) cytosine, (G) guanine, and (T) thymine in the sequence.
Summary
The entries are:
User provides the DNA sequence.
Process:
Software Analyzes the DNA sequence.
Exit:
Complement of the sequence.
Reverse complementary sequence.
In its representations in decimal and percentage format.
Distribution of amino acids present in the sequence.
A visual and informative pie chart illustrating the distribution of (A) adenine, (C) cytosine, (G) guanine, and (T) thymine in the sequence.
Salida de ejemplo.
Function 2. Analysis of Protein properties
In this functionality, the application allows you to enter a protein sequence. Additionally, you have the option to modify the pH of the sequence. Based on this data, the system will automatically generate a series of important parameters related to the entered proteome, such as:
Proteome size.
Molecular weight.
Aromaticity.
Instability index.
Isoelectric point.
Secondary structure.
Molecular extinction coefficient.
Extinction coefficient with reduced cysteines.
Number of disulfide bridges.
Hydropathicity index (Gravy).
Electrical charge at a specific pH.
Flexibility data.
Distribution of elements in the sequence.
K&D hydrophobicity.
Additionally, to facilitate the interpretation of this data, the tool includes representative graphs that give you a way to facilitate the visualization of the information.
Summary
The entries are:
User provides protein sequence.
Load at pH.
Process:
Software analyzes the protein sequence.
Exit:
Proteome size.
Molecular weight.
Aromaticity.
Instability index.
Isoelectric point.
Secondary structure.
Molecular extinction coefficient.
Extinction coefficient with reduced cysteines.
Number of disulfide bridges.
Degree of hydrophobicity (Gravy).
Electrical charge at a specific pH.
Flexibility data.
Distribution of elements in the sequence.
KB/KV relationships.
Add a caption...
Add a caption...
Salida de ejemplo
Function 3. Transcription to mRNA
From a sequence of nucleotides or DNA, the tool is capable of generating the transcript corresponding to the mRNA, as well as providing the template DNA. In addition, it makes it easier for you to view the DNA template strands in both directions, both 5' to 3' and 3' to 5'.
Summary
The entries are:
User, provides DNA or nucleotide sequence.
Process:
Software analyzes the sequence.
Exit:
DNA transcription to mRNA.
DNA template strands both 5' to 3' and 3' to 5'.
Salida de ejemplo
Function 4. Reverse transcription
This functionality allows you to enter an mRNA sequence. Once entered, the system will automatically process this sequence and perform a reverse transcription, generating the corresponding DNA or nucleotide sequence.
Summary
The entries are:
User provides mRNA sequence.
Process:
Software analyzes the sequence.
Exit:
Reverse transcription mRNA to DNA.
Salida de ejemplo
Function 5. Translation of mRNA or DNA into protein
This functionality allows you to enter an mRNA or DNA sequence. You have the option to select any of the 27 available codon tables or use the standard default table. Once you have entered the sequence and selected the codon table, the system will identify and display the sequence type. Additionally, it will provide you with the corresponding translation from mRNA to DNA and the details of the specific codon you chose from the table.
Summary
The entries are:
User provides mRNA sequence.
Select any of the 27 available codon tables or the standard one.
Process:
Software analyzes the sequence.
Exit:
Sequence type.
Translation of mRNA to DNA sequence.
Codon selected for translation.
Salida de ejemplo
Function 6. Pairwise sequence alignment
The pairwise sequence alignment feature allows the user to compare two nucleotide or DNA sequences to identify similarities and differences. Below are the steps to use this feature:
Sequence Entry:
The user must enter the nucleotide or DNA sequences they wish to compare. You can select local or global alignment option depending on your needs.
Optional Parameters:
Percentage of Match and Non-Match:
These values are optional and are set by default. If the user wishes to customize these parameters, they can do so to fit their specific criteria.
Penalty Matrix:
The penalty matrix is an optional parameter. If the user wants to apply a custom penalty matrix, they can select this option.
Results Display:
After running the alignment, the tool will provide the following results:
Coincidence Percentage.
Percentage of Non-Match.
Sequence alignment.
Penalty Matrix (Optional):
If the penalty matrix option is selected, the tool will display the matrix used for the alignment calculation.
Salida de ejemplo
Summary:
Entrance:
The user provides two nucleotide or DNA sequences for comparison.
Process:
The software analyzes the sequences, allowing the user to choose between local or global alignment.
Optional parameters can be adjusted, such as percentage of match, non-match, and a custom penalty matrix.
Departures:
Percentage of coincidence between the sequences.
Non-match percentage.
The visual alignment of the sequences.
Optional: Display of the penalty matrix used in the alignment calculation.
7. BLAST function
The BLAST function makes it easy for the user to scan the NCBI database to identify sequences similar to the given sequence. To start the analysis, the user simply enters the sequence of nucleotides, proteins or genes they wish to study. BLAST will perform a search of the NCBI database, identifying and presenting a list of sequences that share similarities with the one provided by the user. This set of results provides a broad and detailed view of possible matches in the NCBI database.
Summary
Entrance:
The user provides a sequence of nucleotides, proteins or genes.
Process:
BLAST searches the NCBI database to identify similar sequences.
Departures:
BLAST presents a list of sequences found in the NCBI database that share similarities with the sequence provided by the user.
Function 8. PDB Viewer
The PDB Viewer is a tool that allows the user to explore and view files in PDB (Protein Data Bank) format. The operation is described below:
PDB File Upload:
The user must upload a file in PDB format to begin viewing. This file will contain detailed information about the three-dimensional structure.
Display Settings:
Once the file is loaded, a three-dimensional structure can be displayed. The PDB Viewer offers several options to customize the display, including:
Visualization by lines, crosses, spheres or points.
Entanglement control to facilitate interpretation of complex structures.
Atom labeling for quick identification.
Change of colors to highlight different components of the structure.
Exploration and Navigation:
The user can explore the loaded three-dimensional structure, zooming, rotating and panning to gain a complete understanding of the spatial arrangement of the elements.
Summary
Entrance:
The user uploads a file in PDB format.
Process:
Once the file is loaded, the PDB Viewer allows you to view the three-dimensional structure.
Customization options are provided for the display, such as choosing between lines, crosses, spheres, or dots.
Provides tools for entanglement control, atom labeling, and color changing to improve interpretation of complex structures.
Departures:
The user can explore the loaded three-dimensional structure using actions such as zoom, rotate, and pan.
Salida de ejemplo
Function 9. Analysis of PDBs
This feature provides a detailed evaluation of the information contained in a Protein Data Bank (PDB) file. The user must load a PDB file, once loaded the tool will show different aspects of the entered structure, among the key aspects are general data, bibliographic information, composition of structures, missing residues, additional information, physical properties, specific atoms and heteroatoms.
Summary
Entrance:
The user uploads a PDB file.
Process:
The tool performs a detailed evaluation of various aspects of the entered structure.
Departures:
General data:
Name of the structure.
Deposit date.
Release date.
Resolution of the structure in Ångströms.
Bibliographic Information:
Structure keywords.
Structure determination method.
Structure reference.
Reference to the journal where it was published.
Authors.
Structure Composition:
Chemical compound.
Origin of the structure.
Missing Waste:
Presence of missing residues (boolean).
List of missing waste.
Additional Information:
Glycosylation information.
List of models present.
List of strings.
Waste list.
Name and coordinates of the atoms.
Physical Properties:
B-factor.
Distribution of elements.
Distribution of atoms.
Specific Atoms:
Detailed information about atoms, including their name and coordinates.
Heteroatoms:
Distribution of heteroatoms in the structure.
Salida de ejemplo
Function 10. Phylogenetic trees
The Phylogenetic Trees feature allows users to analyze and visualize evolutionary relationships between biological sequences using Clustal format files.
The user uploads the file and the tool performs a distance calculation between these sequences, generating a matrix that reflects the evolutionary differences. It also shows a phylogenetic tree.
Salida de ejemplo
Summary
Entrance:
The user uploads a file in Clustal format.
Process:
The tool calculates the distances between the loaded sequences, generating a matrix that reflects the evolutionary differences.
Departures:
The tool presents a phylogenetic tree.
Function 11. MOTIFS Analysis
This function facilitates detailed analysis of DNA sequences and understanding of this type of sequences regarding biological function, aiding in the identification of the most common nucleotide sequence in a set of related MOTIFS sequences. This can be useful to identify conserved regions in promoters or other regions.
Summary
Entrance:
The user uploads a MOTIFS sequence for analysis.
Process:
The tool performs its analysis.
Departures:
Consensus.
Degenerate consensus.
Complementary Reverse.
Consensus of the complementary reversal.
Salida de ejemplo
Function 12. Regular Expressions
In the genomic and proteomic context, regular expressions are powerful tools that allow searching and analysis on DNA, RNA, and protein sequences.
This allows the search for MOTIFS in DNA to identify binding sites for transcription factors, regulatory elements, among others. Another application that can be given is the identification of phosphorylation and glycosylation sites, among others.
Salida de ejemplo
Summary
Entrance:
The user writes a regular expression to search at either the genomic or proteomic level.
Process:
The tool searches for the regular expression in the sequence.
Departures:
List of organisms, sequences and identifiers of the matches.
Number of matches within the sequence.
About Regular Expressions
To match a single "a" followed by zero or more "b" followed by "c", you would use the pattern /ab*c/: the * after "b" means "0 or more occurrences of the previous element." In the string "cbbabbbbcdebc", this pattern will match the substring "abbbbc".
If you need to use any of the special characters literally (actually looking for a "*", for example), you must escape it by placing a backslash in front of it. For example, to search for "a" followed by "*" followed by "b", you would use /a\*b/ — the backslash "escapes" the "*", making it literal rather than special.
For more information about regular expressions, see:
Just remember that you don't have to write / at the beginning and end, just put the amino acids you want to search for.
Characters
x The character x
\\ The backslash character
\0nThe character with octal value 0n(0<= n <=7)
\0nn The character with octal value 0nn(0<= n <=7)
\0mnn The character with octal value 0mnn(0<= m <=3,0<= n <=7)
\xhh The character with hexadecimal value 0xhh
\uhhhh The character with hexadecimal value 0xhhhh
\t The tab character('\u0009')
\n Thenewline(line feed)character('\u000A')
\r The carriage-returncharacter('\u000D')
\f The form-feed character('\u000C')
\a Thealert(bell)character('\u0007')
\e The escape character('\u001B')
\cx The control character corresponding to x
Character classes
[abc] a, b, or c(simple class)
[^abc]Any character except a, b, or c(negation)
[a-zA-Z] a through z or A through Z,inclusive(range)
[a-d[m-p]] a through d, or m through p:[a-dm-p](union)
[a-z&&[def]] d, e, or f(intersection)
[a-z&&[^bc]] a through z, except for b and c:[ad-z](subtraction)
[a-z&&[^m-p]] a through z, and not m through p:[a-lq-z](subtraction)
Predefined character classes
.Anycharacter(may or may not match line terminators)
\d Adigit:[0-9]
\DA non-digit:[^0-9]
\s A whitespace character:[ \t\n\x0B\f\r]
\SA non-whitespace character:[^\s]
\w A word character:[a-zA-Z_0-9]
\WA non-word character:[^\w]
Boundary matchers
^The beginning of a line
$ The end of a line
\b A word boundary
\BA non-word boundary
\AThe beginning of the input
\GThe end of the previous match
\ZThe end of the input but for the final terminator,if any
\z The end of the input
Greedy quantifiers
X?X, once or not at all
X*X, zero or more times
X+X, one or more times
X{n}X, exactly n times
X{n,}X, at least n times
X{n,m}X, at least n but not more than m times
Reluctant quantifiers
X??X, once or not at all
X*?X, zero or more times
X+?X, one or more times
X{n}?X, exactly n times
X{n,}?X, at least n times
X{n,m}?X, at least n but not more than m times
Possessive quantifiers
X?+X, once or not at all
X*+X, zero or more times
X++X, one or more times
X{n}+X, exactly n times
X{n,}+X, at least n times
X{n,m}+X, at least n but not more than m times
Logical operators
XYX followed by Y
X|YEitherX or Y
Use guide
Before running the application, make sure you comply with previous steps and requirements to run the software, and also define if you are going to deploy the application on your local computer or if you are going to run it from an external server or a local server.
Note: Remember to consider the observations defined for the operating system: MAC OS.
To stop the application press Ctrl+C in the console you have opened.
It is important to see the output of each console because it will indicate if there is a problem executing the application.
Note: If you want to deploy the application on a server you have to edit the settings file located at: /biosuite/settings.py
Later edit the DEVELOPMENT_MODE section and configure parameters according to your configuration.
Step 1 - Unzip the application
Step 2 – Install App Requirements
Step 3 - Launch the application
Step 4 – Open the app
After steps 1, 2 and 3 you can go to your browser, and in the address bar write 127.0.0.1:8000 or the IP address of your local or external server.
Later you will see a screen like the following.
Interfaz de usuario de la aplicación Bio-Suite-T.
Later you must click on Menu > Login. And enter through a social account. It is a requirement that you enter from a Google account to enter, however, it is clarified that your data is protected in accordance with current applicable regulations. Likewise, the data of each user is the sole responsibility of the user since by releasing the copy of the software the user accepts that their data is solely their responsibility, exempting the authors of the software from any damage, loss or theft of sensitive information. The data that is collected is: Name and UID, among others.
Step 4 - Use one of the application modules
By entering from your account you will be able to see the application dashboard, from which you will be able to access any of the modules previously described in the Functions section.
Tablero de la aplicación Bio-Suite-T
Frequently asked questions
Answer and document frequently asked questions here:
Approach
This application focuses on sequence analysis at different levels.
Questions, queries and support
For any questions related to the use of the software, please contact me by the following means:
“AUUUUCUUUGCUCUUGAGCUCUGGCACUUCUCUGCUGCUGUCUG” Data from NIH, Homo sapiens Genomic DNA transcription.
[1] Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb Miller (2000), "A greedy algorithm for aligning DNA sequences", J Comput Biol 2000; 7(1-2):203-14.
[2] Aleksandr Morgulis, George Coulouris, Yan Raytselis, Thomas L. Madden, Richa Agarwala, Alejandro A. Schäffer (2008), "Database Indexing for Production MegaBLAST Searches", Bioinformatics 24:1757-1764.
[3] Aleksandr Morgulis, George Coulouris, Yan Raytselis, Thomas L. Madden, Richa Agarwala, Alejandro A. Schäffer (2008), "Database Indexing for Production MegaBLAST Searches", Bioinformatics 24:1757-1764.
[4] Stephen F. Altschul, John C. Wootton, E. Michael Gertz, Richa Agarwala, Aleksandr Morgulis, Alejandro A. Schäffer, and Yi-Kuo Yu (2005) "Protein database searches using compositionally adjusted substitution matrices", FEBS J. 272:5101-5109.
Support
To receive support or questions about the use of the software, contact me at the following email: