Molecular Evolution of RAMOSA1 (RA1) in Land Plants

Article


RAMOSA1 (RA1) is a Cys2-His2-type (C2H2) zinc finger transcription factor that controls plant meristem fate and identity and has played an important role in maize domestication. Despite its importance, the origin of RA1 is unknown, and the evolution in plants is only partially understood. In this paper, we present a well-resolved phylogeny based on 73 amino acid sequences from 48 embryophyte species. The recovered tree topology indicates that, during grass evolution, RA1 arose from two consecutive SUPERMAN duplications, resulting in three distinct grass sequence lineages: RA1-like A, RA1-like B, and RA1; however, most of these copies have unknown functions. Our findings indicate that RA1 and RA1-like play roles in the nucleus despite lacking a traditional nuclear localization signal. Here, we report that copies diversified their coding region and, with it, their protein structure, suggesting different patterns of DNA binding and protein–protein interaction. In addition, each of the retained copies diversified regulatory elements along their promoter regions, indicating differences in their upstream regulation. Taken together, the evidence indicates that the RA1 and RA1-like gene families in grasses underwent subfunctionalization and neofunctionalization enabled by gene duplication.

Results

Phylogenetic Analysis

SUP Evolution and the Origin of RA1

The BLASTp searches recovered SUP sequences along most of the embryophyte species, except from M. polymorpha and S. moellendorffii. To identify the origin of RA1 and understand its evolution, we reconstructed a phylogenetic tree with amino acid sequences obtained from genomes of grasses and other embryophytes. For that, we used SUP sequences from non-seeds plants (P. patens and C. purpureus) as outgroups. The recovered tree topology, and the presence of a single copy sequence of the grass P. latifolia suggests that RA1 originated from two successive duplications of SUP that occurred at the split of the BOP and PACMAD clade (Figure 1). As a result of these duplications, three lineages of sequences were generated; one of them includes RA1 sequences, and the other two are here named arbitrarily RA1-like A and RA1-like B. In this tree, the evolutionary relationships among RA1 and RA1-like remain unclear.
Figure 1. SUP evolution and the origin of RA1. Majority rule consensus tree (N = 22502 trees) of RA1 transcription factor in embryophytes generated by Bayesian inference using 73 peptide sequences (Figure S1 and Table S1). Black dots indicate Bayesian posterior probability (PP) = [0.9 to 1]. Green arrows indicate gene duplication events. Black asterisk points out proteins with known function (Table S2).

Evolution of RA1 in Grasses

To gain resolution on the relationship between RA1 and RA1-like, we reconstructed a new phylogeny with amino acid sequences obtained from 16 genomes of grass species using as an outgroup the SUP sequence from the monocot J. ascendence (Figure 2a, Figures S2, and S3). The obtained tree topology suggests the possibility that (1) RA1 was originated after a second duplication around the split of the BOP and PACMAD clades, (2) RA1 is sister to the RA1-like B proteins, (3) RA1 was lost in the BOP clade and currently is present in the PACMAD clade, (4) RA1-like A and RA1-like B are found in species of both BOP and PACMAD clades, and (5) species-specific duplications and loss of some copies of RA1 and RA1-like suggest a complex evolution of these molecules during the diversification of the grasses (Figure 2a and Figure S4). Overall, SUPRA1, and RA1-like are single-copy genes, except for the two RA1 variants found in S. bicolor, as previously described [2] (Figure 2a).
Figure 2. Evolution of RA1 in grasses. (a) Majority rule consensus tree (N = 22502 trees) of RA1 and RA1-like transcription factors in grass species generated by Bayesian inference using 35 amino acid sequences (Figures S2 and S3 and Table S1). Each clade is identified by a color. Black dots indicate Bayesian posterior probability (PP) = [0.9 to 1]. Black asterisk points out proteins with known function (Table S3). (b) Motif distribution patterns on RA1 and RA1-like amino acid sequences in grass species. Colored boxes represent motif occurrence (Figures S7 and S8). (c) Sequence representation logo of Motif 1 obtained from the multiple sequence alignment (Figure S9). Black arrowhead indicates the amino acid variant (A- > G) in Motif 1.
Genome annotations indicate that SUP is on Chr3:8242256…8243372 of the Arabidopsis genome, whereas RA1RA1-like A, and RA1-like B are on maize chromosomes Chr7:114958642…114959398, Chr5:68312034…68312578, and Chr10:103427449…103428364, respectively. Chromosomal collinearity visualization maps indicate that Arabidopsis chromosome 3 lacks synteny with maize chromosomes 7, 5, and 10 (Figure S5). In contrast, RA1 and RA1-like genes are placed in syntenic chromosome blocks among maize, Setaria, and rice (Figure S6). These results suggest that RA1 and RA1-like are syntenic paralogs.

Conserved Motifs Analysis among RA1 and RA1-Like Coding Sequences

To discover specific features that characterize the coding region of each lineage, we performed a motif analysis (Figure 2b). We searched for 15 different motifs using MEME to better understand sequence plasticity (Figures S7 and S8). Some of them are lineage-specific (for instance, Motif 7 (WPPPQVRS), Motif 8 (PPPNPNPSCTVLDL), Motif 11 (FPWPPQ), Motif 12 (VVCSCSST), and Motif 13 (MESRSAARAGDQQH)), and others are shared by lineages, such as Motif 3 (ARAPJPNLNYSPPHPA), which is present in RA1-like A and RA1-like B sequences, whereas Motif 4 (APPVVYSFFSLAASA) is shared by RA1 and RA1-like B sequences (Figure 2b and Figure S7). All the sequences have in common the presence of one C2H2 zinc finger domain (Motif 1 (SSSSSYTCGYCKREFRSAQALGGHMNVHRRDRARLRHGQSP)) (Figure 2b and Figure S9). In particular, sequences from RA1 lineage have the variant QGLGGH in the zinc finger domain, as it was previously reported [2] (Figure 2c). In addition, Motif 2 (GDGAEEGLDLELRLG) appeared two to three times towards the C-terminal of all the sequences. Our analysis identified two Motif 2 along the C-terminal of the RA1-like sequences, whereas three Motif 2 were detected on most of the RA1 sequences of the PACMAD clade (Figure 2b and Figure S9). In addition, we found that the number of Leu (L) and the distances between the Motifs 2 vary among clades (Figure 2b and Figure S9). According to the literature, Motif 2 is a putative EAR-like repressor motif [2,41,42,43,44,45].

Subcellular Localization

To determine the in vivo subcellular localization of the RA1 and RA1-like proteins of maize and S. viridis, we generated a N-terminal translational fusion with GFP. The 35S:GFP:RA1 and 35S:GFP:RA1-like constructs were analyzed by a transient expression system in agro-infiltrated young tobacco leaves. The fluorescence was observed through a laser scanning confocal microscope. DAPI staining was used to visualize nuclei localization. As shown in Figure 3, the RA1 and RA1-like proteins are localized in the nuclei of tobacco epidermal cells.
Figure 3. Sub-cellular localization analysis of RA1 and RA1-like proteins. Sub-cellular localization of the RA1-GFP, SvRA1-GFP, SvRA1-like A-GFP, and SvRA1-like B-GFP constructs in tobacco leaf epidermal cells. Green color is GFP protein signal. Blue color represents DAPI-stained nucleus.

Molecular Modeling

Proteins’ Secondary Structure and Conformational Plasticity

The secondary structure properties of the SUP/RA1 members are poorly characterized [33,46]. Only the zinc finger C2H2-type of SUP has had its secondary structure experimentally solved by NMR [33]. So, to gain perspective on protein structure among SUP, RA1, and RA1-like, secondary structures predictions were carried out using the PsiPred Workbench (http://bioinf.cs.ucl.ac.uk/psipred) and its PsiPred 4.0 tool [29]. We also used the server’s tool DisoPred 3.1 [30] to predict regions that cannot be assigned a known secondary structure, which are termed as disordered regions. The same algorithm also predicts regions with high probability of binding other proteins. These properties are graphically shown in Figure 4 and Figure S10 and Table S4 as a function of the residue number/name. Overall, the N-terminal region of SUP, RA1, and RA1-like is predicted as disordered with protein binding affinities. Allocated within the N-terminal disordered segment are common motifs shared between lineages, such as Motif 4 and 5 (Figure 2b). The N-terminal disordered region is followed by a region defined as a coil that continues with the C2H2-type domain. The structural predictions suggest the zinc finger to be formed by a β-sheet (two consecutive β-strands) followed by an α-helix (Table S4). In general, the C-terminal of RA1 and RA1-like is presented as a disordered region with protein binding affinities. Allocated within the C-terminal disordered segment is Motif 3 shared by RA1-like A and RA1-like B (Figure 2b). In addition, we found stabilized ordered segments, such as two to three α-helices. These ordered segments are mostly correlated with the allocation of Motif 2 (Figure 4 and Figure S10 and Table S4).
Figure 4. Protein secondary structure and conformational plasticity. (a) Secondary structure prediction of SUP, ZmRA1-like A, ZmRA1-like B, and RA1 proteins obtained by PsiPred server. α-helix corresponding to the zinc finger domain is represented in orange. α-helix corresponding to Motif 2, putative EAR motif, is represented in violet. Other α-helices are represented in pink. β-strands are represented in yellow. (b) Relative disorder levels of the structures measured in a range of 0 to 1.0 (Figure S10 and Table S4). Levels above 0.5 (dashed line) are considered disordered regions. Abbreviations: ZF, zinc finger domain; EAR, EAR repressor motifs.

Zinc Finger Molecular Dynamics Simulation

The zinc finger of SUP (1NJQ) is a structure resolved by NMR (20 models), which has 75% sequence identity with the zinc finger of RA1 (Figure S11a). There is no experimental structure of RA1. Given differences in sequence identity between SUP and RA1, we performed 1μs molecular dynamics simulations of each protein zinc finger domain to evaluate differences in the structural stability of these variants. The RMSDs of these simulations as a function of simulated time (Figure S11b) show small departures from their initial values (average RMSD values lower than 0.5Å). This fact is associated with the robustness of the structure of the zinc finger domain of both proteins and the proximity of the initial models to the equilibrated structures. We assumed that the simulated time in the period [0.8μs, 1.0μs] is adequate to calculate equilibrium properties of the systems since the RMSDs of the simulations of both proteins show small fluctuations without any appreciable definite tendency.
Analysis of the time evolution of the secondary structure of SUP and RA1 peptides showed the classical pattern of a zinc finger with a β-sheet composed of two strands (Y47T48-F55E56 and Y46T47-F54E55 in SUP and RA1, respectively) and a turn stabilized by H-bonds (S50F51C52 and G49Y50C51 for SUP and RA1), followed by an α-helix (A59:H69 and A58:R72 for SUP and RA1) (Figure 5a, Table S4). The α-helix structural motif is four residues longer for the case of RA1 (an additional turn). Beyond the α-helix structural motif, the SUP peptide presents a turn of a 310-helix (D72:R75) (Figure 5a, Table S4). We observed an additional structure with fluctuating characteristic 310-helix, α-helix, and turns in RA1 peptide (I76:Y80) (Figure 5a, Table S4).
Figure 5. Zinc finger molecular dynamics simulation. (a) Time evolution of the secondary structure of SUP and RA1 proteins as determined by the DSSP algorithm using the simulated structures in the thermodynamics equilibrium interval (0.8 to 1.0 μs). At the side of the graph, the amino acid sequences are quoted with their residue number in the proteins. In the center of the graph, the relative position of each residue in reference to the first one of the α-helix N-terminal is indicated. (b) RMSF of the backbone atoms of each amino acid sequence in the equilibrium interval (0.8 to 1.0 μs), taking as reference for fitting the structures the one with smallest RMSD in the same period.
Previous work suggests that the change of Ala (A) to Gly (G) in the QALGGH motif of the RA1 zinc finger could lead to an enhanced mobility of the residues in the α-helix [2]. However, such a hypothesis has not been tested yet. Here, we performed a comparative RMSF analysis between RA1 and SUP to test whether the G affects the structure dynamic of the zinc finger α-helix (Figure 5b). The RMSF of the SUP and RA1 amino acids were calculated in the equilibrium interval of the simulations for the backbone atoms of each residue. We found that, at this site, the amino acids of RA1 show smaller fluctuations than the respective amino acids of SUP (Figure 5b). One could argue that the effect of the substitution of A by G may not be local; however, we found that the RA1 amino acids show smaller fluctuations along all the extensions of the canonical zinc finger domain. The existence of a longer α-helix motif in the RA1 peptide, followed by the additional helix pattern, causes the region of small fluctuations or relative rigidity of the peptide to extend well beyond the canonical zinc finger motif (Figure 5b). These results indicate that substituting A for G does not impact the three-dimensional structural mobility at that site, nor does it affect the overall structure of the zinc finger.

Cis-Acting Regulatory Elements Analysis among RA1 and RA1-Like Promoter Sequences

When the promoter of RA1 and RA1-like genes of 16 grass species were examined, many regulatory motifs were identified and grouped into 33 types of elements, according to the corresponding transcription factor (TF) family (Figure 6Table S5). Conserved sequences containing binding motifs for well-known transcriptional regulators were recognized. We identified TF motifs involved in different biological processes related to (1) seed germination, such as embryo development, somatic embryogenesis, and positive regulation of cell population proliferation; (2) plant development, such as regulation of secondary shoot formation, leaf development, flower development, and root development; (3) response to abiotic stress, such as response to salt stress, response to light stimulus, and response to water; and (4) response to biotic stress, such as defense response to bacterium (Table S6). In addition, we identified TF binding sites related to positive or negative regulation of hormone signaling pathways, hormone biosynthetic processes, as well as auxin, gibberellin, abscisic acid, and ethylene responses (Table S6).
Figure 6. Noncoding cis-elements identified in predicted promoter regions of RA1 and RA1-like genes from grass species. Colored circles represent presence of conserved motifs in the promoter. Motif consensus sequences are showed in Table S5.
The conserved regulatory motifs harbored nine putative TF binding sites for RA1-like A lineage, 22 putative TF binding sites for RA1-like B lineage, and 13 putative TF binding sites for RA1 lineage. We found two conserved motifs shared by RA1-like and RA1 promoter sequences: (1) ABI-motifs are present among at least 24 of 35 promoter sequences analyzed; (2) WRKY-motifs were identified in 12 of 35 promoter sequences.
RA1-like A shares (1) TCP-motifs with the promoter sequences of RA1-like B of the PACMAD clade, except for the ones of Andropogoneae, and (2) DREB1E-motifs with Andropogoneae promoter sequences of RA1.
RA1-like B has five motifs in common with RA1: (1) ARF-motifs are present in all RA1-like B and RA1 promoter sequences of the Andropogoneae tribe; (2) AGL-motifs are well conserved in RA1-like B promoter sequences, except in the Andropogoneae tribe, while AGL-motifs are restricted to the Andropogoneae tribe in RA1 lineage; (3) MYB-motifs were identified in RA1-like B sequences of the PACMAD and RA1 promoter sequences from the Andropogoneae tribe; (4–5) MNB1A-motif and ERF-motifs were identified among RA1-like B promoter sequences of the PACMAD clade, except for the ones of Andropogoneae, and RA1 promoter sequences from the Andropogoneae tribe.
We found that each species lineage copy is characterized by unique cis-elements. The RA1-like A lineage has conserved BPC-motifs outside the Andropogoneae tribe and a RA1 binding motif in the Andropogoneae tribe. TGA9-motif, GRP9-motif, and HHO3-motif were identified only in promoter sequences from PACMAD clade species outside the Andropogoneae tribe. The RA1-like B lineage is characterized by the presence of NAC-motifs on all promoter sequences analyzed, except for the one of T. intermedium. OsRR22-motif, BZIP48-motif, and GATA20-motif are restricted to the BOP clade. Between species from the PACMAD clade, bHLH130-motif, ANL2-motif, and an unknown motif were identified in the Andropogoneae tribe, while PLT1-motif, FUS3-motif, ABF2-motif, Zm00001d027846-motif, ATHB-12-motif, AHL20-motif, and DOF3.6-motif were identified outside the Andropogoneae tribe. Finally, the promoter sequences of the RA1 lineage are characterized by the presence of (1) SPL14-motif, TB1-motif, CDF5-motif, and O2-motif sequences in the Andropogoneae tribe and (2) SMZ-motif outside the Andropogoneae tribe.