FungiRegEx Software documentation

This guide can be useful if you want to use the software.

Software Name: FungiRegEx
Version: 1.0.0
Date: 



Previous steps

Before executing the software you have to run the following commands and make sure that have all the prerequisites:
You receive the complete folder with the project and the code.
First is needed to install all the dependencies, to perform this you have to run in the folder execute:
npm i
Then install nodemon:
npm install -g nodemon
Then install chromium, depending the OS, check documentation:
choco install googlechrome --version=79.0.3945.117
After performing the previous steps you are ready to execute front-end and backend commands.

Software summary


This tool is a web-based search engine for regular expressions in the proteomes, all the information is obtained from the JGI (Joint Genome Institute) database through a scraper for all the available species; therefore this tool only considers fungi organisms.
In this version, we use React JS in front-end and NodeJS + Express for back-end.
It is also important to note that the number of chromium instances the computer can open in parallel is configurable based on the computer's resources.


Requirements Index Pointing Up


This software has been developed and tested using a computer with the next characteristics:
4 GB RAM
Cent OS 7 and Windows 10
Core i7 5th generation processor
About the characteristics of software:
Node JS 16.17.0v
Chromium 79.0.3945.117v
React JS 17.0.2v
The project contains all the libraries needed to deploy it, just write the next lines in the bash or console according to your OS needed that you are in the directory where you download the application.
npm run start:frontend
npm run start:backend


Initial configuration Gear


You can deploy this application both on a local server/computer and on your own computer, this will be determined by each user depending on the available computing resources and the availability that each user wants.

Take into account that if multiple users connect at the same time, they will interrupt the tasks that are being executed and the new user will be given priority, that is, the tasks of the previous user will be eliminated from the queue and the tasks of the new user will be executed.


Local server/local computer/external server


In case you deploy the application using a local server (which can be another computer) follow the next instructions:

Get the IP address of the server/local server/computer.
For windows you can follow the next instructions:
For LINUX:
For other distributions look for information on official websites.
Navigate to the filesystem until you find the folder that contains the application.
As the applications run in an external computer different from the user we need to change the IP address parameters in the next files:
/../interfacebio/src/components/main/Main.jsx
/../interfacebio/src/components/table/Table.jsx
For the Main.jsx file you have to modify the IP address on lines: 14,117 and 15, for example, let's suppose that the external computer/server has the IP 555.811.22.33 the code that you will find in the application is:
const [products, setProducts] = useState([])
const [progressVal, setProgressVal] = useState(0)
const [blocked, setBlocked] = useState(false);
const endpoint = "http://192.168.1.102:8000/file" //line 14
const progress = "http://192.168.1.102:8000/progress"//line 15
You have to change IP address values to:
const [products, setProducts] = useState([])
const [progressVal, setProgressVal] = useState(0)
const [blocked, setBlocked] = useState(false);
const endpoint = "http://555.811.22.33:8000/file"//line 14
const progress = "http://555.811.22.33:8000/progress"//line 15
In line 117 you will find:
try {
await axios.post("http://192.168.1.102:8000/resultsScrap", {
stringtoBackend
})

}
You have to change the IP address value to:
try {
await axios.post("http://555.811.22.33:8000/resultsScrap", {
stringtoBackend
})

}

For the file Table.jsx you have to modify the IP address on lines: 10, for example, let's suppose that the external computer/server has the IP 555.811.22.33 the code that you will find in the application is:
const endpoint = "http://192.168.1.102:3000/listaURLnew.json"
You have to change the IP address value to:
const endpoint = "http://555.811.22.33:3000/listaURLnew.json"

Own computer


You have to get the IP address of your computer, you can use the last section steps from the last section.
Then also edit the files and change the IP value with the IP address of your computer.


Cluster configuration Gear


The principle of operation of the application is based on opening multiple tabs of the chromium browser, obtaining the list of URLs, and making requests to the JGI server. The number of tabs you open on the computer where it is run depends on this parameter and will determine the amount of resources to use.
Considering the previous requirements we use no more than 50 instances. You can set this parameter by opening the file  directory_where_you_download_the_application/interfacebio/backend.js 
In this file you will see this part of the code:
var cluster = await Cluster.launch({
concurrency: Cluster.CONCURRENCY_CONTEXT,
maxConcurrency: 30,
puppeteerOptions: {
headless: true,
defaultViewport: false,
args: ['--no-sandbox', '--disable-setuid-sandbox'],

},
});
maxConcurrency: 30. This Is the parameter to modify (no more than 50 is recommended).
The higher the number placed, the higher the search speed and the higher the consumption of resources.

About Regular Expressions Printer


To match a single  "a"  followed by zero or more  "b" s followed by  "c" , you'd use the pattern  /ab*c/ : the  *  after  "b"  means "0 or more occurrences of the preceding item." In the string  "cbbabbbbcdebc" , this pattern will match the substring  "abbbbc". 

If you need to use any of the special characters literally (actually searching for a  "*" , for instance), you must escape it by putting a backslash in front of it. For instance, to search for  "a"  followed by  "*"  followed by  "b" , you'd use  /a\*b/  — the backslash "escapes" the  "*" , making it literal instead of special.

Just remember that you don't have to write / at the beginning and at the end, just place the amino acids that you want to search for.

Characters
x The character x
\\ The backslash character
\0n The character with octal value 0n (0 <= n <= 7)
\0nn The character with octal value 0nn (0 <= n <= 7)
\0mnn The character with octal value 0mnn (0 <= m <= 3, 0 <= n <= 7)
\xhh The character with hexadecimal value 0xhh
\uhhhh The character with hexadecimal value 0xhhhh
\t The tab character ('\u0009')
\n The newline (line feed) character ('\u000A')
\r The carriage-return character ('\u000D')
\f The form-feed character ('\u000C')
\a The alert (bell) character ('\u0007')
\e The escape character ('\u001B')
\cx The control character corresponding to x
Character classes
[abc] a, b, or c (simple class)
[^abc] Any character except a, b, or c (negation)
[a-zA-Z] a through z or A through Z, inclusive (range)
[a-d[m-p]] a through d, or m through p: [a-dm-p] (union)
[a-z&&[def]] d, e, or f (intersection)
[a-z&&[^bc]] a through z, except for b and c: [ad-z] (subtraction)
[a-z&&[^m-p]] a through z, and not m through p: [a-lq-z](subtraction)
Predefined character classes
. Any character (may or may not match line terminators)
\d A digit: [0-9]
\D A non-digit: [^0-9]
\s A whitespace character: [ \t\n\x0B\f\r]
\S A non-whitespace character: [^\s]
\w A word character: [a-zA-Z_0-9]
\W A non-word character: [^\w]
Boundary matchers
^ The beginning of a line
$ The end of a line
\b A word boundary
\B A non-word boundary
\A The beginning of the input
\G The end of the previous match
\Z The end of the input but for the final terminator, if any
\z The end of the input
Greedy quantifiers
X? X, once or not at all
X* X, zero or more times
X+ X, one or more times
X{n} X, exactly n times
X{n,} X, at least n times
X{n,m} X, at least n but not more than m times
Reluctant quantifiers
X?? X, once or not at all
X*? X, zero or more times
X+? X, one or more times
X{n}? X, exactly n times
X{n,}? X, at least n times
X{n,m}? X, at least n but not more than m times
Possessive quantifiers
X?+ X, once or not at all
X*+ X, zero or more times
X++ X, one or more times
X{n}+ X, exactly n times
X{n,}+ X, at least n times
X{n,m}+ X, at least n but not more than m times
Logical operators
XY X followed by Y
X|Y Either X or Y


How-to guide Hatching Chick


Before running it, make sure that you have the following installed:
Node JS 16.17.0v
Chromium browser in a stable version
If you don't have NodeJS installed you can get it from:  https://nodejs.org/en/ 
If you don't have chromium installed you can get it from:  https://www.chromium.org/getting-involved/download-chromium/ 
Just consider the OS.


Step 1 - Opening consoles and execute commands to start Web based application


Execute the next commands to start the application, in the first console:
npm run start:frontend
You can see the output of this command in Figure 1.

In the second console, run the next command:
npm run start:backend
You can see the output of this command in Figure 2.
To stop the app just hit Ctrl+C on the consoles.


Open Console or bash (2 instances):
Is important to see the output cause it indicates if an error during execution occurs. Note: the back-end uses a not optimized React App, if you want to deploy an optimized application you can execute the next command:
npm run build
This command will create an optimized version.


Step 3 - Open the Web-based application


After steps 1 and 2 you can go to your browser, and open localhost:3000 or the address that indicates the console window.
In Figure 1 we see that console indicates that the address is  http://localhost:3000  or  http://192.168.1.102:3000  but it could variate so just make sure that you have the right address, then place it into the navbar of the browser.
We see in Figure 3 the interface consists of 3 main components:
Introduction
Table of supported species
A form to contact us
You can search for the scientific name of a specie, download the table data or filter the species.


Step 4 - Go and use FungiRegEx


After steps 1 and 2 you can go to your browser, and open localhost:3000 or the address that indicates the console window.


The main interface consists of:
Type of search:
Globally: if you want to search in all the species and look for Regular Expression

Specific species: If you want to search in the proteome of a specific specie the Regular Expression
Range or List of ID:
You can set a range for the tool to search in.

In this version, we include 2,402 different species, available in the selector.


Step 5 - Write the Regular Expression


This tool is characterized by the ease of writing regular expressions since they are written using the sequence that the user wants. To learn more about supported regular expressions you can consult the specific section "about regular expressions".

For  example, to match a single  "FA"  followed by zero or more  "A" s, you'd use the pattern   FAA* : the  *  after  "A"  means "0 or more occurrences of the preceding item."


Step 6 - Search


After all the previous steps you can click the search button, if you did not perform the steps the app will not perform anything.
Also, consider that at the beginning the application can take a while, as soon as the results are obtained they will be displayed in the table.
The speed of search depends on different factors:
Load of servers of JGI
Internet speed of the user
Computer resources
Configuration of the cluster to open multiple instances, this parameter is available in the backend.js file, to modify this parameter see the section "cluster configuration"
For this example, I perform a search from 1 to 1000 ID in Trichoderma aethiopicum CBS130628 v1.0 using the regular expression FAA*. The results are:

The results will have the next order:
Specie
ID
Scaffold / Version
Proteome
#matches
Matches
If the query to the JGI server with the species and the identifier is empty, that is, JGI does not have the proteome, NO_SHORTNAME, NO DATA will be displayed.

During the search process, the results will be displayed in the table, you can order by the number of matches, also the table will contain the proteome.

Take into account that requests to the JGI server sometimes do not return anything, so if the result of the request is empty, the application will not show this information in the table. Given this situation, in the previous search of 1000 identifiers that were passed as search parameters, only 995 were obtained. It is also clarified that this does not happen frequently and that it is unrelated to the operation of this tool.


Tutorials Clapper Board

Have any videos or knowledge bases to link if new users need additional guidance? Include them here.

Drag and drop or




FAQs Man Raising Hand

Answer and document frequently asked questions below.

Question

Answer

Question

Answer

Question

Answer


Additional Resources Puzzle Piece

Include any additional information, forum or documentation that might be relevant to users here for easy access when questions arise.


Support Speech Balloon

This application was developed by Technotronic Engineer Victor Miguel Terron M. with advice from Ph.D. Miguel Angel Canseco Pérez. You can contact us:





Make it your own
Once edited to your liking,  save this template to your team’s templates list  by clicking on the three dots on the right of the screen.