The ever-increasing number of sequenced genomes presents us with an exciting opportunity to discover highly conserved gene families of unknown function, then characterize them experimentally. We have detected these gene families across the kingdom Fungi (see Methods at the bottom of this page), and invite the international research community to functionally characterize their individual members and propagate their annotations across the Fungal Tree of Life. Investigators can register, login, click on any cluster below (the number of the left), and add notes to any protein from the list along with the methods used for functional characterization.

Conserved Genes Families of Unknown Function
Total Families:
142
Total Genes:
79,713
Total Unique Species:
1,282
Total Annotated Genes:
33
Total Unique PFAM Domains:
170
Total Unique Uniprot:
110
Total Unique PDB:
64
Updated:
2020-11-06
##GenesExpressed Genes %Genes with PhenotypesUnique SpeciesProteins with PFAM DomainsUnique PFAM Domains CountProtein PFAM DomainsUniprot HMM HintPDB HMM HintAlphaFold pLDDTFoldseek PDB HintConserved InUser Curated ModelsAvg. Protein Length
11,2983437741Uncharacterized alpha/beta hydrolase domain (DUF2235):
774
Bacteria0504
21,2401,1011,23812-hydroxy-palmitic acid dioxygenase Mpo1-like:
1,238
Eukaryota2197
31,2221,1467032Protein of unknown function (DUF2408):
702
MIF4G domain:
1
Viridiplantae0461
41,2191,1851,2191Uncharacterised protein family (UPF0160):
1,219
Eukaryota0352
51,1541,1385097HEAT repeat:
494
HEAT-like repeat:
9
Adaptin N terminal region:
4
HEAT repeats:
1
Damage-control phosphatase ARMT1-like domain:
1
Alpha-L-rhamnosidase N-terminal domain:
1
DNA/RNA non-specific endonuclease:
1
Viridiplantae02,039
61,1271,1101,1273Domain of unknown function (DUF383):
1,126
Domain of unknown function (DUF384):
1,122
SecE/Sec61-gamma subunits of protein translocation complex:
1
Opisthokonta + Viridiplantae0390
71,1201,0931,1191Hikeshi-like, N-terminal domain:
1,119
Opisthokonta + Viridiplantae0214
81,1161,0511,11612-phospho-L-lactate transferase CofD:
1,116
Eukaryota0496
91,0721,0451,0531CXXC motif containing zinc binding protein, eukaryotic:
1,053
Opisthokonta1162
101,0809615601Protein of unknown function (DUF4449):
560
Viridiplantae0866
111,0831,0712096HEAT repeat:
202
Cse1:
4
Ankyrin repeats (3 copies):
1
Ssl1-like:
1
RNase P subunit Pop3:
1
Ankyrin repeat:
1
Viridiplantae11,118
121,0331,0159819TPR repeat:
481
Tetratricopeptide repeat:
454
Tetratricopeptide repeat:
336
Tetratricopeptide repeat:
37
Tetratricopeptide repeat:
10
Tetratricopeptide repeat:
7
NUDIX domain:
4
Tetratricopeptide repeat:
4
Tetratricopeptide repeat:
2
Eukaryota0936
131,02371451Lipocalin / cytosolic fatty-acid binding protein family:
5
0194
141,0248205651Domain of unknown function (DUF4149):
565
Viridiplantae1182
151,0147581,0131Gig2-like:
1,013
Universal0477
1693692822L27 domain:
1
Ribosomal protein L7Ae/L30e/S12e/Gadd45 family:
1
Viridiplantae0510
179276735675Protein of unknown function (DUF3292):
566
Integral peroxisomal membrane peroxin:
1
Plant phosphoribosyltransferase C-terminal:
1
HIUase/Transthyretin family:
1
Profilin:
1
0674
188578338542NFACT protein RNA binding domain:
854
YacP-like NYN domain:
1
Eukaryota0215
198578098571Protein of unknown function (DUF1348):
857
Universal0157
207897777822Eukaryotic integral membrane protein (DUF1751):
782
Rhomboid family:
1
Viridiplantae0370
217946334851Questin oxidase-like:
485
Viridiplantae0419
227627384462Uncharacterized protein conserved in bacteria (DUF2264):
445
Transcription factor WhiB:
1
Prokaryotes0649
237467367096UBA/TS-N domain:
508
TPR repeat:
364
Tetratricopeptide repeat:
52
Tetratricopeptide repeat:
22
DnaJ domain:
4
Chaperonin 10 Kd subunit:
1
Opisthokonta + Viridiplantae1913
24751733000119
257487407483Protein adenylyltransferase SelO:
748
Ankyrin repeats (3 copies):
1
Leucine Rich Repeat:
1
Universal0633
267457017261Protein of unknown function (DUF1769):
726
Viridiplantae0323
277257157161Armadillo-like helical domain-containing protein 3, C-terminal:
716
Opisthokonta + Viridiplantae0644
28720713141Bridge-like lipid transfer protein family member 1, C-terminal:
14
Viridiplantae + Bacteria13,197
297127076373THADA/TRM732, DUF2428:
384
HEAT repeat:
255
HEAT-like repeat:
4
01,599
307077007001PAT complex subunit CCDC47:
700
Viridiplantae + Bacteria1436
316996304123Fungal protein of unknown function (DUF1752):
412
SGF29 tudor-like domain:
5
Protein of unknown function (DUF3295):
1
Viridiplantae0559
327037012706Ankyrin repeat:
162
Ankyrin repeats (many copies):
106
Ankyrin repeats (3 copies):
1
Zinc finger, C2H2 type:
1
Ubiquitin interaction motif:
1
Ankyrin repeat:
1
Viridiplantae1646
336836743872DENN domain-containing protein 11:
387
Domain of unknown function (DUF4484):
387
Viridiplantae0606
346776692552Protein of unknown function (DUF3712):
254
BTB/POZ domain:
1
Viridiplantae0855
356616586223WD domain, G-beta repeat:
622
Nup133 N terminal like:
1
Nucleoporin Nup120/160:
1
Viridiplantae0510
36634600332NAD dependent epimerase/dehydratase family:
25
NAD(P)H-binding:
8
0300
376596536591Protein of unknown function (DUF1295):
659
Universal0363
38658654511Protein of unknown function (DUF4449):
51
Viridiplantae0743
3965964551SKG6 family:
5
Opisthokonta0713
4065264653Putative peptidoglycan binding domain:
2
Tim10/DDP family zinc finger:
2
Myosin tail:
1
Viridiplantae1876
41655634176Galactose oxidase, central domain:
6
Kelch motif:
6
Glycophorin A:
2
Herpesvirus glycoprotein D/GG/GX domain:
1
Kelch motif:
1
Galactose oxidase, central domain:
1
Viridiplantae0751
4264964300Viridiplantae1966
43651637000351
4464964511SOS response associated peptidase (SRAP):
1
0390
4565164800Viridiplantae0563
4664863821S25 ribosomal protein:
2
Viridiplantae0768
4762946721Fungal protein of unknown function (DUF1774):
2
0273
48646644001553
4964163748510Tetratricopeptide repeat:
247
Tetratricopeptide repeat:
111
TPR repeat:
61
Tetratricopeptide repeat:
51
Tetratricopeptide repeat:
10
Tetratricopeptide repeat:
9
Tetratricopeptide repeat:
5
Tetratricopeptide repeat:
3
Tetratricopeptide repeat:
2
Tetratricopeptide repeat:
2
Viridiplantae0322
50645640000274
51645641000302
5264564000Viridiplantae0328
5363463051ATPase family associated with various cellular activities (AAA):
5
Viridiplantae0368
5464163332Metallo-beta-lactamase superfamily:
2
Mitochondrial carrier protein:
1
Viridiplantae + Bacteria01,694
556386313461WD domain, G-beta repeat:
346
Viridiplantae0423
5663763000Viridiplantae3453
576386313503Armadillo/beta-catenin-like repeat:
324
HEAT repeat:
56
HEAT repeats:
4
Opisthokonta + Viridiplantae01,020
58634629003433
5963362300Viridiplantae0429
6063562711Ubiquitin fusion degradation protein UFD1:
1
Viridiplantae + Cryptophyta1841
61636619000946
626386353601DnaJ homologue, subfamily C, member 28, conserved domain:
360
Viridiplantae0522
636296243491PF08217:
349
Viridiplantae0828
64631627000455
656296262130S ribosomal protein subunit S22 family:
2
Viridiplantae2448
66625622101SnoaL-like polyketide cyclase:
10
Viridiplantae0551
676246213571Domain of unknown function (DUF4078):
357
Viridiplantae0358
6862461211Protein of unknown function (DUF4030):
1
0138
69622618301Telomere attrition and p53 response 1 protein-like:
30
0264
7062061700Viridiplantae0643
71619616000369
7262061700 Bacteria0357
73621615000441
7462161411PQ loop repeat:
1
01,106
7561660622Plethodontid receptivity factor PRF:
1
emp24/gp25L/p24 family/GOLD:
1
Viridiplantae01,298
76618615000559
7761961300Viridiplantae + Bacteria0499
786196103611Domain of unknown function (DUF4452):
361
Viridiplantae0195
796216153611Domain of unknown function (DUF4604):
361
0171
8061661062Bombesin-like peptide:
5
Serine incorporator (Serinc):
1
0303
816156082023GATA zinc finger:
195
AT hook motif:
11
Fungal Zn(2)-Cys(6) binuclear cluster domain:
6
Viridiplantae + Bacteria01,233
826146093511Protein of unknown function (DUF2418):
351
Viridiplantae0461
8361261011AIR carboxylase:
1
0598
846146081305Tetratricopeptide repeat:
125
Coatomer epsilon subunit:
2
Tetratricopeptide repeat:
1
Tetratricopeptide repeat:
1
Tetratricopeptide repeat:
1
Viridiplantae0440
85611607000549
86608605000908
8761060732SAP domain:
2
Rho termination factor, N-terminal domain:
1
0353
8854534211Liver-expressed antimicrobial peptide 2 precursor (LEAP-2):
1
0335
895294765291Uncharacterized protein family UPF0016:
529
Eukaryota1276
904904714881Integral membrane protein DUF92:
488
Eukaryota0309
9148531411COPI associated protein:
1
0277
924694544691Uncharacterised protein family UPF0047:
469
Universal0141
93439427774Ankyrin repeat:
59
Ankyrin repeats (many copies):
11
Ankyrin repeat:
6
Ankyrin repeats (many copies):
1
Opisthokonta1659
944474164471Protein of unknown function (DUF1295):
447
Universal0359
954334234311PAT complex subunit CCDC47:
431
1373
964184082221Uncharacterized conserved protein (DUF2340):
222
Eukaryota1142
97414380001263
98409350222PhoD-like phosphatase:
21
Cytochrome c oxidase subunit Vb:
1
Opisthokonta + Viridiplantae + Bacteria0700
99401374642Armadillo/beta-catenin-like repeat:
63
HEAT repeat:
1
0676
1003953873324WD domain, G-beta repeat:
330
WD40-like Beta Propeller Repeat:
4
Eukaryotic translation initiation factor eIF2A:
1
60s Acidic ribosomal protein:
1
Eukaryota0503
1013833733822Eukaryotic integral membrane protein (DUF1751):
382
Der1-like family:
1
2356
1023853741972Cell wall protein YJL171C/Tos1, C-terminal:
197
Cell wall protein YJL171C/Tos1, N-terminal:
197
Viridiplantae + Archaea1467
103381329432GDSL-like Lipase/Acylhydrolase family:
36
GDSL-like Lipase/Acylhydrolase:
7
1567
104372357001318
105370366921Armadillo/beta-catenin-like repeat:
92
0696
1063753653752Protein adenylyltransferase SelO:
375
Rpp14/Pop5 family:
3
Universal0725
10735535221SPT2 chromatin protein:
2
03,248
10835234811AT hook motif:
1
0993
109341335000785
11033731711Kelch motif:
1
0388
111338318000636
112342328882BTB/POZ domain:
48
MATH domain:
41
0610
113333309000472
11433131511UreD urease accessory protein:
1
0873
115330319001441
116326312000237
1173213191861Protein of unknown function (DUF2418):
186
0390
1183183081392V-type proton ATPase subunit S1, luminal domain:
138
PF08319:
1
0283
119320315000375
120313309122Arrestin (or S-antigen), C-terminal domain:
9
Eukaryotic protein of unknown function (DUF1764):
3
0663
1212992711702Protein of unknown function (DUF3712):
170
Ribosomal RNA adenine dimethylase:
1
Bacteria02,401
1223052901522Ykl077w/Psg1 (Pma1 Stabilization in Golgi):
151
Amino acid permease:
1
0545
12319056221MAPEG family:
22
0173
1241781551771Coiled-coil domain-containing protein 90-like:
177
Opisthokonta1204
1251551001551CYRIA/CYRIB Rac1 binding domain:
155
Eukaryota0328
12610776815Protein of unknown function (DUF3684):
78
Histidine kinase-, DNA gyrase B-, and HSP90-like ATPase:
12
CUE domain:
4
Histidine kinase-, DNA gyrase B-, and HSP90-like ATPase:
2
TLD:
1
Eukaryota01,680
1278355000262
1289284000589
1298279821Uncharacterised protein family (UPF0203):
82
Universal083
1306662000281
131666121MIOREX complex component 7:
2
070
1326855181Uncharacterised protein (DUF2406):
18
0190
133592711Cytochrome c oxidase assembly protein PET191:
1
0537
1344430000380
135373391Armadillo/beta-catenin-like repeat:
9
0943
136363611YLP motif:
1
0557
137343463Ankyrin repeats (many copies):
3
Ankyrin repeat:
2
Glycolipid 2-alpha-mannosyltransferase:
1
Viridiplantae0818
138282661Perilipin family:
6
0298
1392726162Meiotically up-regulated protein Msb1/Mug8 domain:
12
RhoGAP domain:
4
01,410
140272543Leucine Rich Repeat:
2
Leucine Rich Repeat:
1
Leucine Rich repeat:
1
0905
1412626101PF13345:
10
0652
1422525000346

Methods

Over 18 millions proteins encoded in 1282 fungal genomes from Mycocosm were clustered into families using cascaded MMseqs2 with default parameters (Steinegger et al, 2017). Our subset of 142 clusters have the following 3 properties. Each is:

An individual family member may have manual curations retrieved from MycoCosm or functional domains not shared with the rest of its family. Families as a whole may also have similarity to distant protein families in Uniprot or Protein Data Bank (PDB), as found by pairwise HMM-based HHblits searches (Steinnegger et al, 2019) against the non-redundant Uniprot20_2016 (defined by <20% sequence identity) and PDB70 (defined by <70% sequence identity) sets of protein sequences. Such distantly related proteins are presented in the list as "hints" (‘Uniprot HMM Hint’ and ‘PDB HMM Hint’ columns).

References

  1. Steinegger M, Söding J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat Biotechnol. 2017 Nov;35(11):1026-1028. doi: 10.1038/nbt.3988. Epub 2017 Oct 16. PMID: 29035372.
  2. Steinegger M, Meier M, Mirdita M, Vöhringer H, Haunsberger SJ, Söding J. HH-suite3 for fast remote homology detection and deep protein annotation. BMC Bioinformatics. 2019 Sep 14;20(1):473. doi: 10.1186/s12859-019-3019-7. PMID: 31521110; PMCID: PMC6744700.