ICAR ASRB NET – Bioinformatics 2023 model paper
ICAR ASRB NET – Bioinformatics 2023 Model Paper।
Q1. Which of the following tools is used for RNA-Seq data alignment?
A) BLAST
B) TopHat
C) ORF Finder
D) SignalP
Answer: B
Explanation: TopHat is designed for aligning RNA-Seq reads to reference genomes and identifies splice junctions.
Q2. The database that provides 3D structures of biological macromolecules is:
A) GenBank
B) PDB
C) DDBJ
D) GEO
Answer: B
Explanation: Protein Data Bank (PDB) archives 3D structural data of proteins, nucleic acids, and complex assemblies.
Q3. Hidden Markov Models (HMMs) are commonly used in:
A) DNA sequencing
B) Phylogenetic tree construction
C) Protein secondary structure prediction
D) RNA editing
Answer: C
Explanation: HMMs model sequence probabilities and are widely used in secondary structure and domain prediction.
Q4. Which tool is used to identify protein domains and families?
A) ORF Finder
B) BLAST
C) InterProScan
D) Clustal Omega
Answer: C
Explanation: InterProScan integrates several domain databases like Pfam, SMART, and TIGRFAMs to identify protein signatures.
Q5. Which statistical test is used to compare means between two independent samples?
A) ANOVA
B) Chi-square test
C) t-test
D) Mann-Whitney U test
Answer: C
Explanation: The t-test compares the means of two independent groups assuming normal distribution.
Q6. Which of the following is NOT a primary database?
A) GenBank
B) Swiss-Prot
C) EMBL
D) GEO
Answer: D
Explanation: GEO (Gene Expression Omnibus) is a secondary database that stores processed expression data.
Q7. What is the output of a multiple sequence alignment?
A) Conserved regions
B) Phylogenetic trees
C) BLAST scores
D) Molecular weight
Answer: A
Explanation: Multiple sequence alignment helps identify conserved sequences, motifs, and regions of similarity.
Q8. FASTA format starts with a line that begins with:
A) #
B) ;
C) >
D) @
Answer: C
Explanation: The > symbol is used to denote the sequence ID line in FASTA format.
Q9. The Needleman-Wunsch algorithm is primarily used for:
A) Global alignment
B) Local alignment
C) Primer design
D) RNA folding
Answer: A
Explanation: Needleman-Wunsch performs global alignment of two sequences using dynamic programming.
Q10. Which of the following is NOT a bioinformatics algorithm?
A) Smith-Waterman
B) ClustalW
C) Dynamic Time Warping
D) BLAST
Answer: C
Explanation: DTW is used in signal processing; it's not standard in bioinformatics.
Q11. Which of the following is used for de novo genome assembly?
A) BLAST
B) Bowtie
C) SPAdes
D) Clustal Omega
Answer: C
Explanation: SPAdes is a popular tool for assembling genomes from short-read data without a reference.
Q12. Which of the following is a curated protein sequence database?
A) GenBank
B) UniProtKB/Swiss-Prot
C) DDBJ
D) RefSeq
Answer: B
Explanation: Swiss-Prot provides high-quality, manually annotated protein sequence data.
Q13. What is the function of the tool ‘Primer3’?
A) Protein modeling
B) Primer design
C) Pathway prediction
D) Splice site detection
✅ Answer: B
🧠 Explanation: Primer3 is widely used for designing primers for PCR experiments.
Q14. Which command in Python outputs “Bioinformatics”?
A) echo “Bioinformatics”
B) print(“Bioinformatics”)
C) show “Bioinformatics”
D) printf(“Bioinformatics”)
✅ Answer: B
🧠 Explanation: The print() function is used in Python to display output.
Q15. FASTQ files store which of the following?
A) Protein sequences
B) Nucleotide sequences with quality scores
C) Alignment information
D) 3D protein structure
✅ Answer: B
🧠 Explanation: FASTQ combines DNA sequence data with per-base quality scores from NGS.
Q16. Which database provides functional classification of proteins based on gene ontology terms?
A) KEGG
B) Pfam
C) GO
D) BRENDA
✅ Answer: C
🧠 Explanation: Gene Ontology (GO) offers structured vocabulary for molecular function, biological process, and cellular component.
Q17. RPKM is used for:
A) Protein structure prediction
B) Gene expression normalization
C) Variant calling
D) Sequence alignment
✅ Answer: B
🧠 Explanation: RPKM (Reads Per Kilobase of transcript per Million mapped reads) normalizes RNA-Seq data.
Q18. A volcano plot is used in:
A) Protein modeling
B) Expression analysis
C) Sequence alignment
D) Clustering
✅ Answer: B
🧠 Explanation: Volcano plots visualize differential gene expression (fold-change vs. statistical significance).
Q19. Which package in R is commonly used for gene expression analysis?
A) ggplot2
B) dplyr
C) Bioconductor
D) tidyr
✅ Answer: C
🧠 Explanation: Bioconductor provides tools for the analysis of genomic data, including microarrays and RNA-Seq.
Q20. Which of the following file formats stores phylogenetic trees?
A) FASTA
B) GFF
C) Newick
D) BAM
✅ Answer: C
🧠 Explanation: Newick format represents tree structure using nested parentheses.
Q21. Which tool is used for RNA secondary structure prediction?
A) SignalP
B) RNAfold
C) Bowtie
D) BLAST
✅ Answer: B
🧠 Explanation: RNAfold predicts minimum free energy structures of RNA sequences.
Q22. What does “E-value” in BLAST indicate?
A) Number of alignments
B) Expected number of hits by chance
C) Alignment score
D) GC content
✅ Answer: B
🧠 Explanation: Lower E-values mean more significant hits; it estimates the number of alignments expected by random chance.
Q23. Which of the following is an unsupervised machine learning technique?
A) Random Forest
B) Support Vector Machine
C) Hierarchical Clustering
D) Naïve Bayes
✅ Answer: C
🧠 Explanation: Clustering methods (e.g., hierarchical) do not rely on predefined class labels.
Q24. Which scripting language is most commonly used for pipeline management in bioinformatics?
A) Java
B) Bash
C) C++
D) HTML
✅ Answer: B
🧠 Explanation: Bash is used to automate tasks and manage large datasets and workflows.
Q25. Which command is used in Linux to list directory contents?
A) list
B) ls
C) show
D) dir
✅ Answer: B
🧠 Explanation: ls is the standard command to list files and folders in a directory.
Q26. KEGG is mainly used for:
A) Protein structure
B) Pathway mapping
C) Phylogeny
D) Primer design
✅ Answer: B
🧠 Explanation: KEGG (Kyoto Encyclopedia of Genes and Genomes) maps genes to biochemical pathways.
Q27. Which method is used for dimensionality reduction in transcriptomics data?
A) SVM
B) Random Forest
C) PCA
D) BLAST
✅ Answer: C
🧠 Explanation: PCA (Principal Component Analysis) reduces dimensionality while preserving variance in data.
Q28. The file format used to store sequence alignments is:
A) FASTA
B) SAM
C) GTF
D) VCF
✅ Answer: B
🧠 Explanation: SAM (Sequence Alignment Map) stores aligned sequence data in text format.
Q29. Which type of sequencing is most suitable for studying epigenetic modifications?
A) RNA-Seq
B) ChIP-Seq
C) ATAC-Seq
D) Exome-Seq
✅ Answer: B
🧠 Explanation: ChIP-Seq identifies DNA-binding proteins and histone modifications.
Q30. The scripting language used for statistical computing and graphics is:
A) Python
B) Perl
C) R
D) Java
✅ Answer: C
Explanation: R is designed for statistical data analysis and visualization.
Q31. Which of the following is a statistical method used to detect differentially expressed genes?
A) PCA
B) DESeq2
C) Bowtie
D) ClustalW
✅ Answer: B
🧠 Explanation: DESeq2 uses statistical models to determine differential gene expression in RNA-Seq datasets.
Q32. What is the main use of the Ensembl genome browser?
A) 3D protein structure visualization
B) Phylogenetic tree generation
C) Exploration of genome sequences and annotations
D) Primer design
✅ Answer: C
🧠 Explanation: Ensembl provides access to genome-level data, annotations, and comparative genomics tools.
Q33. Which tool is commonly used for aligning short reads to a reference genome?
A) BLAST
B) MUSCLE
C) Bowtie
D) MEME
✅ Answer: C
🧠 Explanation: Bowtie is optimized for fast, memory-efficient alignment of short DNA sequences.
Q34. Which of the following databases is focused on metabolic pathways?
A) UniProt
B) KEGG
C) GO
D) PDB
✅ Answer: B
🧠 Explanation: KEGG provides maps of molecular interaction and reaction networks including metabolism.
Q35. Which programming construct is used for decision-making in Python?
A) loop
B) function
C) if-else
D) return
✅ Answer: C
🧠 Explanation: if-else statements are used to execute code conditionally based on logical tests.
Q36. Which of the following is an open-source platform for analyzing high-throughput genomic data?
A) MATLAB
B) Galaxy
C) Excel
D) Notepad++
✅ Answer: B
🧠 Explanation: Galaxy allows accessible, reproducible bioinformatics analysis on the web.
Q37. What is a contig in genome assembly?
A) A single DNA read
B) An alignment result
C) A set of overlapping DNA segments
D) A protein domain
✅ Answer: C
🧠 Explanation: Contigs are stretches of sequence formed from overlapping reads during genome assembly.
Q38. What is a volcano plot's x-axis typically representing?
A) Gene expression counts
B) Log2 fold change
C) P-values
D) Gene length
✅ Answer: B
🧠 Explanation: The x-axis represents log2 fold change, indicating up/downregulation of genes.
Q39. The FASTA format contains:
A) Sequence only
B) Sequence and alignment
C) Sequence and quality
D) Structure and sequence
✅ Answer: A
🧠 Explanation: FASTA stores nucleotide or protein sequences, usually with a single-line header and raw sequence.
Q40. Which of the following measures dispersion in a data set?
A) Mean
B) Mode
C) Median
D) Standard deviation
✅ Answer: D
🧠 Explanation: Standard deviation indicates how spread out the values are from the mean.
Q41. Which alignment algorithm uses dynamic programming?
A) BLAST
B) Smith-Waterman
C) Bowtie
D) MAUVE
✅ Answer: B
🧠 Explanation: Smith-Waterman provides optimal local alignment using dynamic programming.
Q42. Which language is preferred for statistical tests and data visualization in bioinformatics?
A) Java
B) Python
C) R
D) C++
✅ Answer: C
🧠 Explanation: R is specially developed for statistical computing and visualization.
Q43. A Manhattan plot is typically used in:
A) Protein interaction studies
B) RNA splicing analysis
C) GWAS studies
D) Sequence alignment
✅ Answer: C
🧠 Explanation: Manhattan plots visualize significant SNP associations in genome-wide association studies.
Q44. Which of the following file formats contains variant information?
A) SAM
B) FASTA
C) GTF
D) VCF
✅ Answer: D
🧠 Explanation: VCF (Variant Call Format) stores information about sequence variants like SNPs and indels.
Q45. The default plot function in R creates what type of plot?
A) Pie chart
B) Line plot
C) Scatter plot
D) Heatmap
✅ Answer: C
🧠 Explanation: The default plot() function in R often produces scatter plots for two numeric variables.
Q46. BLASTX translates:
A) Protein to DNA
B) DNA to protein
C) DNA in six frames to proteins
D) RNA to DNA
✅ Answer: C
🧠 Explanation: BLASTX translates a nucleotide sequence in six frames and aligns to a protein database.
Q47. Which tool is best suited for predicting protein 3D structure from sequence?
A) BLAST
B) TM-align
C) AlphaFold
D) CD-HIT
✅ Answer: C
🧠 Explanation: AlphaFold is a revolutionary AI tool for predicting 3D protein structures.
Q48. The correlation coefficient ranges from:
A) -2 to 2
B) 0 to 1
C) -1 to 1
D) -∞ to ∞
✅ Answer: C
🧠 Explanation: The Pearson correlation coefficient lies between -1 and 1, indicating strength and direction of a linear relationship.
Q49. Which tool clusters proteins by sequence identity?
A) CD-HIT
B) MUSCLE
C) BLAST
D) MAFFT
✅ Answer: A
🧠 Explanation: CD-HIT clusters similar protein or nucleotide sequences based on identity threshold.
Q50. Which of the following methods is used for bootstrapping in phylogenetics?
A) Maximum Parsimony
B) BLAST
C) Neighbor-Joining
D) Resampling
✅ Answer: D
🧠 Explanation: Bootstrapping uses resampling techniques to assess the reliability of phylogenetic trees.
Q51. What does FPKM stand for in RNA-Seq data analysis?
A) Fold Per Kilobase of Million
B) Fragments Per Kilobase of Exon per Million
C) Frequency Per Kilobase of Exon
D) Functional Protein Kinase Model
✅ Answer: B
🧠 Explanation: FPKM is used to normalize RNA-Seq data based on gene length and total reads.
Q52. Which database is primarily used for protein sequences and functions?
A) GenBank
B) UniProt
C) KEGG
D) RCSB PDB
✅ Answer: B
🧠 Explanation: UniProt is a comprehensive resource for protein sequences and functional information.
Q53. What does the p.adjust() function in R do?
A) Adjusts P-values for multiple testing
B) Normalizes data
C) Generates plots
D) Fits regression models
✅ Answer: A
🧠 Explanation: p.adjust() applies multiple testing correction methods such as Bonferroni and FDR.
Q54. Which command is used in Python to install packages via pip?
A) install pip package
B) pip download
C) pip install package_name
D) python package install
✅ Answer: C
🧠 Explanation: pip install package_name is the standard syntax to install packages.
Q55. Which of these is a supervised machine learning method?
A) K-means clustering
B) Principal Component Analysis
C) Support Vector Machine
D) t-SNE
✅ Answer: C
🧠 Explanation: SVM is a supervised classification algorithm used widely in bioinformatics.
Q56. Which format is used to represent genome annotations?
A) FASTQ
B) BED
C) GTF
D) PDB
✅ Answer: C
🧠 Explanation: GTF (Gene Transfer Format) is commonly used to represent gene and transcript annotations.
Q57. Which of these is not a DNA sequencing technology?
A) Illumina
B) PacBio
C) Nanopore
D) NCBI
✅ Answer: D
🧠 Explanation: NCBI is a repository; the others are sequencing platforms.
Q58. Which is a type of error commonly corrected in sequence data preprocessing?
A) Segmentation fault
B) Adapter contamination
C) Kernel panic
D) Package misconfiguration
✅ Answer: B
🧠 Explanation: Adapter sequences must be removed before sequence alignment to ensure accuracy.
Q59. Which of the following uses Hidden Markov Models (HMMs)?
A) Clustal Omega
B) InterProScan
C) BLAST
D) GSEA
✅ Answer: B
🧠 Explanation: InterProScan integrates HMM-based methods to predict protein domains.
Q60. Which is the correct command to read a CSV file in R?
A) read.txt()
B) import.csv()
C) read_csv()
D) read.csv()
✅ Answer: D
🧠 Explanation: read.csv() is the built-in function in R to read CSV files.
Q61. What does ROC curve stand for?
A) Ratio of Concentration
B) Rate of Coverage
C) Receiver Operating Characteristic
D) Range of Correlation
✅ Answer: C
🧠 Explanation: ROC curves evaluate the performance of classification models.
Q62. In structural bioinformatics, RMSD is used to:
A) Compare DNA sequences
B) Visualize metabolic pathways
C) Compare 3D molecular structures
D) Normalize RNA-Seq data
✅ Answer: C
🧠 Explanation: RMSD (Root Mean Square Deviation) quantifies structural similarity.
Q63. Which of the following is a multiple sequence alignment tool?
A) MEME
B) BLASTN
C) T-Coffee
D) AlphaFold
✅ Answer: C
🧠 Explanation: T-Coffee is a reliable tool for multiple sequence alignments.
Q64. In R, which function is used to calculate correlation?
A) mean()
B) var()
C) cor()
D) test()
✅ Answer: C
🧠 Explanation: cor() computes Pearson or Spearman correlation between vectors.
Q65. Which file contains both sequence and quality information?
A) FASTA
B) PDB
C) FASTQ
D) BED
✅ Answer: C
🧠 Explanation: FASTQ contains nucleotide sequences and corresponding quality scores.
Q66. What does PCA reduce in high-dimensional data?
A) Mean
B) Bias
C) Variance
D) Dimensionality
✅ Answer: D
🧠 Explanation: Principal Component Analysis (PCA) reduces the number of variables (dimensions) in the data.
Q67. Which of the following is used to assign functions to uncharacterized proteins?
A) Homology modeling
B) Docking
C) Domain prediction
D) Functional annotation via BLAST
✅ Answer: D
🧠 Explanation: BLAST can suggest putative functions by identifying homologs with known functions.
Q68. A gene ontology (GO) term describes:
A) Protein 3D structure
B) Gene sequence length
C) Biological function, process, or component
D) Regulatory elements
✅ Answer: C
🧠 Explanation: GO terms categorize genes/proteins by molecular function, biological process, or cellular component.
Q69. Which software tool is used for phylogenetic tree visualization?
A) MEGA
B) GATK
C) STAR
D) DESeq2
✅ Answer: A
🧠 Explanation: MEGA is widely used for building and visualizing evolutionary trees.
Q70. What is a common statistical test for comparing two sample means?
A) Chi-square test
B) ANOVA
C) t-test
D) Mann-Whitney U test
✅ Answer: C
🧠 Explanation: The t-test is used to determine if the means of two groups are significantly different.
Q71. In genome assembly, which term refers to contiguous sequences built from overlapping reads?
A) Isoforms
B) Contigs
C) Exons
D) Scaffolds
✅ Answer: B
🧠 Explanation: Contigs are sequences formed by overlapping DNA fragments in genome assembly.
Q72. The Needleman-Wunsch algorithm is used for:
A) Genome annotation
B) Phylogenetic tree construction
C) Global sequence alignment
D) Primer design
✅ Answer: C
🧠 Explanation: It performs optimal global alignment between two sequences using dynamic programming.
Q73. What type of plot is commonly used to visualize differential expression results in RNA-Seq?
A) Scatter plot
B) Manhattan plot
C) Volcano plot
D) Heatmap
✅ Answer: C
🧠 Explanation: Volcano plots show significance vs fold change in gene expression.
Q74. What is the output format of the BLAST tool?
A) XML
B) JSON
C) HTML
D) All of the above
✅ Answer: D
🧠 Explanation: BLAST output can be saved in multiple formats including HTML, XML, plain text, and JSON.
Q75. In Python, what is the output of len("Bioinformatics")?
A) 13
B) 12
C) 14
D) 11
✅ Answer: A
🧠 Explanation: The word “Bioinformatics” has 13 characters.
Q76. Which database provides information about metabolic pathways and chemical reactions?
A) UniProt
B) PDB
C) KEGG
D) DDBJ
✅ Answer: C
🧠 Explanation: KEGG is used for understanding high-level functions and utilities of biological systems.
Q77. The process of converting mRNA into cDNA is known as:
A) Translation
B) Replication
C) Reverse transcription
D) PCR
✅ Answer: C
🧠 Explanation: Reverse transcription synthesizes complementary DNA from RNA templates.
Q78. Which type of omics data would be used to study metabolite levels in cells?
A) Transcriptomics
B) Proteomics
C) Genomics
D) Metabolomics
✅ Answer: D
🧠 Explanation: Metabolomics is the study of small molecules (metabolites) in biological systems.
Q79. Which software is commonly used for docking studies?
A) AutoDock
B) GROMACS
C) MEGA
D) ClustalW
✅ Answer: A
🧠 Explanation: AutoDock is widely used for ligand-protein docking studies.
Q80. Which algorithm is used in local sequence alignment?
A) BLAST
B) Smith-Waterman
C) Needleman-Wunsch
D) Clustal Omega
✅ Answer: B
🧠 Explanation: Smith-Waterman is a dynamic programming algorithm used for optimal local alignments.
Q81. Which of these is not a statistical programming language?
A) R
B) Python
C) Perl
D) HTML
✅ Answer: D
🧠 Explanation: HTML is a markup language, not used for statistical computation.
Q82. What is the main goal of clustering in bioinformatics?
A) Predict 3D structure
B) Group similar data points
C) Perform sequence alignment
D) Build phylogenetic trees
✅ Answer: B
🧠 Explanation: Clustering is used for unsupervised grouping of genes, proteins, or samples.
Q83. What does the lm() function in R perform?
A) Logistic mapping
B) Linear modeling
C) Log mean calculation
D) List matrix creation
✅ Answer: B
🧠 Explanation: lm() fits linear models in R.
Q84. In high-throughput sequencing, which step comes immediately after sequencing?
A) PCR
B) Read alignment
C) cDNA synthesis
D) Library preparation
✅ Answer: B
🧠 Explanation: Once reads are generated, they are aligned to the reference genome.
Q85. Which algorithm is used for motif discovery in DNA sequences?
A) MEME
B) MUSCLE
C) BLAST
D) MAFFT
✅ Answer: A
🧠 Explanation: MEME is used for finding conserved motifs in sequences.
Q86. Which R package is widely used for differential gene expression analysis?
A) GATK
B) DESeq2
C) ggplot2
D) edgeR
✅ Answer: B
🧠 Explanation: DESeq2 is a standard tool for RNA-Seq differential expression analysis.
Q87. What is the basic unit of protein structure?
A) Nucleotides
B) Amino acids
C) Sugars
D) Codons
✅ Answer: B
🧠 Explanation: Proteins are made up of amino acids.
Q88. In systems biology, the nodes of a network represent:
A) DNA sequences
B) Reactions
C) Entities such as genes or proteins
D) Experimental replicates
✅ Answer: C
🧠 Explanation: Nodes often represent biological entities in networks.
Q89. Which function in Python is used to generate random numbers?
A) math.rand()
B) random()
C) random.randint()
D) numpy.int()
✅ Answer: C
🧠 Explanation: random.randint() generates random integers.
Q90. Which of the following is used to compare expression levels across different samples?
A) FASTA
B) VCF
C) RPKM
D) GFF
✅ Answer: C
🧠 Explanation: RPKM (Reads Per Kilobase Million) normalizes RNA-Seq expression data for comparison.
Q91. Which of the following is a file format for multiple sequence alignments?
A) FASTQ
B) PDB
C) CLUSTAL
D) SAM
✅ Answer: C
🧠 Explanation: CLUSTAL format is used to store multiple sequence alignment data.
Q92. What does the term “bootstrapping” refer to in phylogenetic analysis?
A) Visualizing a tree
B) Removing gaps
C) Testing reliability of tree branches
D) Sequence shuffling
✅ Answer: C
🧠 Explanation: Bootstrapping provides confidence levels for tree branches in phylogenetic trees.
Q93. In supervised machine learning, which of the following is a typical input?
A) Unlabeled data
B) Labelled training data
C) FASTA files
D) Protein models
✅ Answer: B
🧠 Explanation: Supervised learning uses labeled training data to learn predictive models.
Q94. Which software is commonly used for molecular dynamics simulations?
A) AutoDock
B) MEGA
C) GROMACS
D) Geneious
✅ Answer: C
🧠 Explanation: GROMACS is widely used for simulating protein and biomolecular dynamics.
Q95. What is the purpose of PCA in bioinformatics?
A) Sequence alignment
B) Dimensionality reduction
C) Gene prediction
D) Molecular docking
✅ Answer: B
🧠 Explanation: PCA (Principal Component Analysis) is used to reduce dimensions and visualize data structure.
Q96. The FASTQ file format contains:
A) Amino acid sequences
B) Sequence IDs and structures
C) DNA sequence and quality scores
D) RNA transcriptome sequences
✅ Answer: C
🧠 Explanation: FASTQ format combines nucleotide sequences and their base quality scores.
Q97. Which term describes gene sets with similar expression profiles?
A) Isoforms
B) Modules
C) Codons
D) Transposons
✅ Answer: B
🧠 Explanation: Gene modules are clusters of genes with similar expression patterns across conditions.
Q98. What is an example of unsupervised machine learning in bioinformatics?
A) Support Vector Machine
B) Logistic Regression
C) K-means clustering
D) Decision tree
✅ Answer: C
🧠 Explanation: K-means is used for unsupervised clustering of samples or features.
Q99. In proteomics, trypsin is used for:
A) Ligand binding
B) Protein quantification
C) Peptide digestion
D) Structure prediction
✅ Answer: C
🧠 Explanation: Trypsin cleaves proteins at lysine and arginine residues for mass spec analysis.
Q100. The central dogma of molecular biology describes:
A) Protein to RNA to DNA
B) RNA to DNA to protein
C) DNA to RNA to protein
D) DNA to protein to RNA
✅ Answer: C
🧠 Explanation: DNA is transcribed to RNA, which is then translated to protein.
Comments
Post a Comment