Previous Experience and
Seth Dobrin received his Phd in Genetics—with emphasis on statistical and molecular genetics applied to human behavioral disorders, from Arizona State University.
Prior to his role in IBM, Seth Dobrin has been working at the intersection of data, AI, and genetics for over 20 years in very renowned companies of the calibre of Monsanto and Motorola.
Seth's unique background brought him to develop and implement long-term strategies encompassing the automation of the day to day operations of the high throughput genotyping labs and the transformation of the IT systems and data analysis pipelines.
Seth's forward-looking vision and strategic thinking of the use of data and Machine Learning techniques to lead the digital transformation of the companies which he worked made him an exceptional leader in the field of Data & AI.
Research Studies Genetics
Below a selection of Seth's academic publications and research.
Study on essential derivation in maize: III. Selection and evaluation of a panel of single nucleotide polymorphism loci for use in European and North American germplasm
Crop Science 55 (3), 1170-1180
Abstract: Pairwise distance data for maize (Zea maysL.) inbred lines generated using sets of single nucleotide polymorphisms (SNPs) selected from a 50k Infinium array were compared with pairwise distances generated using a set of 163 simple sequence repeat (SSR) loci previously identified to help determine essentially derived variety (EDV) status (UPOV, 1991). Final com-parisons were made using 26,874 SNPs after discarding SNPs with insufficient data quality or vulnerability to ascertainment bias. Inbred lines developed in the United States or in western Europe that had been previously published to establish SSR-based thresholds provided the means to determine equivalent SNP-based pro-tocols. Use of 3072 SNPs selected to provide even genomic coverage according to genetic and physical maps provided robust, precise, high discrimination among inbred lines with con-sistent zonal classification with up to 20% miss-ing data. Comparisons of intercepts and slopes for SSR and SNP inbred pairwise distance data translated the 82% SSR green-orange similar-ity threshold to 91% using SNPs and the 90% SSR orange-red threshold to 95% using SNPs. Information required to conduct analyses using these 3072 SNPS is presented
Yves Rousselle, Elizabeth Jones, Alain Charcosset, Philippe Moreau, Kelly Robbins, Benjamin Stich, Carsten Knaak, Pascal Flament, Zivian Karaman, Jean-Pierre Martinant, Michael Fourneau, Alain Taillardat, Michel Romestant, Claude Tabel, Javier Bertran, Nicolas Ranc, Denis Lespinasse, Philippe Blanchard, Alex Kahler, Jialiang Chen, Jonathan Kahler, Seth Dobrin, Todd Warner, Ron Ferris, Stephen Smith
Quantitative Trait Locus Analysis of Carotid Atherosclerosis in an Intercross Between C57BL/6 and C3H Apolipoprotein E–Deficient Mice
Background and Purpose— Inbred mouse strains C57BL/6J (B6) and C3H/HeJ (C3H) exhibit marked differences in atherosclerotic lesion formation in the carotid arteries on the apolipoprotein E–deficient (apoE−/−) background when fed a Western diet. Quantitative trait locus analysis was performed on an intercross between B6.apoE−/− and C3H.apoE−/− mice to determine genetic factors contributing to variation in the phenotype.
Methods— Female B6.apoE−/− mice were crossed with male C3H.apoE−/− mice to generate F1 hybrids, which were intercrossed to generate 241 female F2 progeny. At 6 weeks of age, F2 mice were started on a Western diet. After being fed the diet for 12 weeks, F2 mice were analyzed for phenotypes such as lesion size in the left carotid arteries and plasma lipid levels and typed for 154 genetic markers spanning the mouse genome.
Results— One significant quantitative trait locus, named CAth1 (25 cM, log of the odds score: 4.5), on chromosome 12 and 4 suggestive quantitative trait loci, on chromosomes 1, 5, 6, and 11, respectively, were identified to influence carotid lesion size. One significant quantitative trait locus on distal chromosome 1 accounted for major variations in plasma low-density lipoprotein/very-low-density lipoprotein, high-density lipoprotein cholesterol, and triglyceride levels. Carotid lesion size was not significantly correlated with plasma low-density lipoprotein/very-low-density lipoprotein or high-density lipoprotein cholesterol levels.
Conclusions— These data indicate that the loci for carotid lesions do not overlap with those for aortic lesions as identified in a previous cross derived from the same parental strains, and carotid atherosclerosis and plasma lipids are controlled by separate genetic factors in the B6 and C3H mouse model.
A novel locus for adolescent idiopathic scoliosis on chromosome 12p
Journal of Orthopaedic Research 27 (10), 1366-1372
Abstract: Adolescent idiopathic scoliosis (AIS) is a common disorder with strong evidence for genetic predisposition. Quantitative trait loci (QTLs) for AIS susceptibility have been identified on chromosomes. We performed a genome‐wide genetic linkage scan in seven multiplex families using 400 marker loci with a mean spacing of 8.6 cM. We used Genehunter Plus to generate linkage statistics, expressed as homogeneity (HLOD) scores, under dominant and recessive genetic models. We found a significant linkage signal on chromosome 12p, whose support interval extends from near 12pter, spanning approximately 10 million bases or 31 cM. Fine mapping within the region using 20 additional markers reveals maximum HLOD = 3.7 at 5 cM under a dominant inheritance model, and a split peak maximum HLOD = 3.2 at 8 and 18 cM under a recessive inheritance model. The linkage support interval contains 95 known genes. We found evidence suggestive of linkage on chromosomes 1, 6, 7, 8, and 14. This study is the first to find evidence of an AIS susceptibility locus on chromosome 12. Detection of AIS susceptibility QTLs on multiple chromosomes in this and other studies demonstrate that the condition is genetically heterogeneous.
Combining gene expression, demographic and clinical data in modeling disease: a case study of bipolar disorder and schizophrenia
J Struyf, S Dobrin, D PageBmc Genomics 9 (1), 531
Abstract: Background – This paper presents a retrospective statistical study on the newly-released data set by the Stanley Neuropathology Consortium on gene expression in bipolar disorder and schizophrenia. This data set contains gene expression data as well as limited demographic and clinical data for each subject. Previous studies using statistical classification or machine learning algorithms have focused on gene expression data only. The present paper investigates if such techniques can benefit from including demographic and clinical data.
Results – We compare six classification algorithms: support vector machines (SVMs), nearest shrunken centroids, decision trees, ensemble of voters, naïve Bayes, and nearest neighbor. SVMs outperform the other algorithms. Using expression data only, they yield an area under the ROC curve of 0.92 for bipolar disorder versus control, and 0.91 for schizophrenia versus control. By including demographic and clinical data, classification performance improves to 0.97 and 0.94 respectively.
Conclusion – This paper demonstrates that SVMs can distinguish bipolar disorder and schizophrenia from normal control at a very high rate. Moreover, it shows that classification performance improves by including demographic and clinical data. We also found that some variables in this data set, such as alcohol and drug use, are strongly associated to the diseases. These variables may affect gene expression and make it more difficult to identify genes that are directly associated to the diseases. Stratification can correct for such variables, but we show that this reduces the power of the statistical methods.
Quantitative Trait Locus Analysis of Carotid Atherosclerosis in an Intercross Between C57BL/6 and C3H Apolipoprotein E–Deficient Mice
Background and Purpose—Inbred mouse strains C57BL/6J (B6) and C3H/HeJ (C3H) exhibit marked differences in atherosclerotic lesion formation in the carotid arteries on the apolipoprotein E– deficient (apoE/) background when fed a Western diet. Quantitative trait locus analysis was performed on an intercross between B6.apoE/ and C3H.apoE/ mice to determine genetic factors contributing to variation in the phenotype.
Methods—Female B6.apoE/ mice were crossed with male C3H.apoE/ mice to generate F1 hybrids, which were intercrossed to generate 241 female F2 progeny. At 6 weeks of age, F2 mice were started on a Western diet. After being fed the diet for 12 weeks, F2 mice were analyzed for phenotypes such as lesion size in the left carotid arteries and plasma lipid levels and typed for 154 genetic markers spanning the mouse genome.
Results—One significant quantitative trait locus, named CAth1 (25 cM, log of the odds score: 4.5), on chromosome 12 and 4 suggestive quantitative trait loci, on chromosomes 1, 5, 6, and 11, respectively, were identified to influence carotid lesion size. One significant quantitative trait locus on distal chromosome 1 accounted for major variations in plasma low-density lipoprotein/very-low-density lipoprotein, high-density lipoprotein cholesterol, and triglyceride levels. Carotid lesion size was not significantly correlated with plasm
Conclusions—These data indicate that the loci for carotid lesions do not overlap with those for aortic lesions as identified in a previous cross derived from the same parental strains, and carotid atherosclerosis and plasma lipids are controlled by separate genetic factors in the B6 and C3H mouse model.
Identifying potential candidate genes in an Irish bipolar disorder sample linked to 14q21-32.
Ulster Medical Journal . 2008, Vol. 77 Issue 1, p76-76. 1/5p
Abstract: Bipolar affective disorder (BPAD) is a severe and debilitating psychiatric illness. Family, twin and adoption studies have established a substantial genetic component to the illness but the genes involved have yet to be fully elucidated. A 10cM genome-wide linkage scan (WGS) was performed in a collection of 60 Irish BPAD affected sib pairs to locate chromosomal regions that may harbour susceptibility genes. The most significant result was on chromosome 14 at 75cM (14q24). Since the region of the chromosome containing significant P values was substantial, we undertook a fine-mapping analysis to refine the linkage peak. 144 SNP markers (400kb resolution) were analysed in an extended sample of 88 ASPs. Linkage analysis resolved our original linkage peak into 4 separate peaks, two of which overlap with published linkage peaks for related psychiatric disorders, such as anxiety and alcoholism. The most significant NPL score of 2.71 was at 67.84Mb, remarkably close to the original WGS peak score at 68.2Mb. In an additional analysis, two SNPs were found to be associated with BPAD (rs24166076 at 46.97Mb and rs4902942 at 71.21Mb). This project has substantially refined the region of chromosome 14 predicted to contain a candidate susceptibility gene for BPAD
Candidate gene analysis of 21q22: support for S100B as a susceptibility gene for bipolar affective disorder with psychosis
Am J Med Genet Part B 144B:1094–1096.
Abstract: A genome‐wide scan in 60 bipolar affective disorder (BPAD) affected sib‐pairs (ASPs) identified linkage on chromosome 21 at 21q22 (D21S1446, NPL = 1.42, P = 0.08), a BPAD susceptibility locus supported by multiple studies. Although this linkage only approaches significance, the peak marker is located 12 Kb upstream of S100B, a neurotrophic factor implicated in the pathology of psychiatric disorders, including BPAD and schizophrenia. We hypothesized that the linkage signal at 21q22 may result from pathogenic disease variants within S100B and performed an association analysis of this gene in a collection of 125 BPAD type I trios. S100B single nucleotide polymorphisms (SNPs) rs2839350 (P = 0.022) and rs3788266 (P = 0.031) were significantly associated with BPAD. Since variants within S100B have also been associated with schizophrenia susceptibility, we reanalyzed the data in trios with a history of psychosis, a phenotype in common between the two disorders. SNPs rs2339350 (P = 0.016) and rs3788266 (P = 0.009) were more significantly associated in the psychotic subset. Increased significance was also obtained at the haplotype level. Interestingly, SNP rs3788266 is located within a consensus‐binding site for Six‐family transcription factors suggesting that this variant may directly affect S100B gene expression. Fine‐mapping analyses of 21q22 have previously identified transient receptor potential gene melastatin 2 (TRPM2), which is 2 Mb upstream of S100B, as a possible BPAD susceptibility gene at 21q22. We also performed a family‐based association analysis of TRPM2 which did not reveal any evidence for association of this gene with BPAD. Overall, our findings suggest that variants within the S100B gene predispose to a psychotic subtype of BPAD, possibly via alteration of gene expression. © 2007 Wiley‐Liss, Inc.
Genome‐wide scan of bipolar disorder and investigation of population stratification effects on linkage: Support for susceptibility loci at 4q21, 7q36, 9p21, 12q24, 14q24, and 16p13
Am J Med Genet Part B 144B:791–801.
Abstract: Bipolar disorder (BPD) is a complex genetic disorder with cycling symptoms of depression and mania. Despite the extreme complexity of this psychiatric disorder, attempts to localize genes which confer vulnerability to the disorder have had some success. Chromosomal regions including 4p16, 12q24, 18p11, 18q22, and 21q21 have been repeatedly linked to BPD in different populations. Here we present the results of a whole genome scan for linkage to BPD in an Irish population. Our most significant result was at 14q24 which yielded a non‐parametric LOD (NPL) score of 3.27 at the D14S588 marker with a nominal P‐value of 0.0006 under a narrow (bipolar type I only) model of affection. We previously reported linkage to 14q22‐24 in a subset of the families tested in this analysis. We also obtained suggestive evidence for linkage at 4q21, 9p21, 12q24, and 16p13, chromosomal regions that have all been previously linked to BPD. Additionally, we report on a novel approach to linkage analysis, STRUCTURE‐Guided Linkage Analysis (SGLA), which is designed to reduce genetic heterogeneity and increase the power to detect linkage. Application of this technique resulted in more highly significant evidence for linkage of BPD to three regions including 16p13, a locus that has been repeatedly linked to numerous psychiatric disorders.
Comparison of HapMap data on Affymetrix and Illumina platforms: expanding the power of studies and a not so unexpected synergy
PHARMACOGENOMICS VOL. 8, NO. 2 199-201
Abstract: This study compares and contrasts three different high-density single nucleotide polymorphism genotyping platforms using data generated on the 270 HapMap samples. The differences in minor allele frequencies are evaluated, coverage across the entire genome using r2 and then the coverage of the ENCyclopedia Of DNA Elements (ENCODE) regions is compared using both a single- and multi-point evaluation. All of these analyses are carried out on the three HapMap populations.
Identifying potential candidate genes in an irish bipolar disorder sample linked to 14q21-32
SA Tishkoff, FA Reed, A Froment, MW Smith, SM Williams, SA Omar, MJ Kotze, GS Pretorius, M Ibrahim, O Doumbo, M Thera, C Wambebe, SE Dobrin, JL Weber
Investigation of susceptibility loci for bipolar affective disorder on chromosome 21
AMERICAN JOURNAL OF MEDICAL GENETICS PART B-NEUROPSYCHIATRIC GENETICS 141
Siobhan Roche, Fiona Cassidy, Chengfeng Zhao, Badger Jonathon, Lisa Mooney, Catherine Delaney, Seth Dobrin, Patrick McKeon
Automating microsatellite genotyping with array tape
In JALA: Journal of the Association for Laboratory Automation 11 (4), 260-267
Abstract: Our laboratory has been testing ways to reduce costs, sample volumes, and decrease labor in microsatellite (or short tandem repeat polymorphism) genotyping. Microsatellite genotyping involves polymerase chain reaction amplification of a short (100–400 bp) fragment of chromosomal DNA that encompasses the tandem repeats followed by electrophoresis to size the amplification products. Using a continuous polypropylene tape (array tape) embossed with 384-well arrays, conforming to the microtiter plate standard, we have been able to perform the amplification reactions in smaller volumes and to decrease handling of stacks of microtiter plates. Instruments were constructed in-house to achieve these results. However, the problem of removal of the samples from the tape for electrophoresis remained. We report here efficient piercing of the tape seal for extraction of the samples using a CO2 laser. Scoring of the seals with the laser weakens it sufficiently to permit extraction of the samples with a syringe array. CO2 lasers are robust systems that do not contain a lot of frequently replaced parts, and do not require frequent recalibration. In addition, the laser is software controlled allowing for highly reproducible scoring and easily switching between 384-, 1536-, and 96-well formats.
Clinical applications of whole-genome association studies: future applications at the bedside
Expert review of molecular diagnostics 6 (4), 551-565
Abstract: Abstract: Until now, performing whole-genome association studies has been an unattainable, but highly desirable, goal for geneticists. With the recent advent of high-throughput genotyping platforms, this goal is now a reality for geneticists today and for clinicians in the not-so-distant future. This review will cover a broad range of topics to provide an overview of this emerging branch of genetics, and will provide references to more specific sources. Specifically, this review will cover the technologies available today and in the near future, the specific types of whole-genome association studies, the benefits and limitations of these studies, the applications to complex disease–gene interactions, diagnostic devices, therapeutics, and finally, we will describe the 5-year perspective and key issues.
Recessive symptomatic focal epilepsy and mutant contactin-associated protein-like 2
Expert review of molecular diagnostics 6 (4), 551-565
Abstract: Most epileptic disorders can be traced to an abnormality of cortical architecture, channel-mediated currents, neuronal growth and differentiation, or cerebral metabolism.1,2 In most cases, however, the underlying biologic complexity of epilepsy precludes the identification of the genetic cause, and 65 to 79 percent of recurrent seizure syndromes remain unexplained.3 Microarray analysis of DNA samples can be a powerful tool for revealing a genetic lesion in well-defined families. We have used this approach in Old Order Amish families, some members of which have a clinical and neuropathological phenotype that we designate as the cortical dysplasia–focal epilepsy (CDFE) syndrome. We identified a genetic variation in the gene encoding CASPR2 in affected patients, a finding that suggests that CASPR2 influences brain development.
The mania and the delusions surrounding the genomic overlap of bipolar type I and schizophrenia
S Dobrin, P Stafford, C Zhao, SCHIZOPHRENIA RESEARCH 81, 45-46
Kevin A. Strauss, M.D., Erik G. Puffenberger, Ph.D., Matthew J. Huentelman, Ph.D., Steven Gottlieb, M.D., Seth E. Dobrin, Ph.D., Jennifer M. Parod, B.S., Dietrich A. Stephan, Ph.D., and D. Holmes Morton, M.D.
New England Journal of Medicine 354 (13), 1370-1377
A whole genome scan for linkage in 62 Irish pedigrees with bipolar disorder
In American Journal Of Medical Genetics-A 138 (1), 74-75
F Cassidy, J Badger, C Zhao, S Dobrin, S Roche, P McKeon
No causative DLL4 mutations in periodic catatonia patients from 15q15 linked families
Schizophrenia research 75 (1), 1-3
Abstract: Two well-supported theories of schizophrenia pathogenesis are the neurotransmitter theory and the neurodevelopmental theory, suggesting, respectively, that dysregulation of neurotransmitter signaling and abnormal brain development are causative in this disease. The strongest evidence of neurotransmitter involvement are suggestions of abnormal dopamine signaling in the prefrontal cortex and one of the strongest indications of developmental abnormalities contributing to this disease is an inverse layering of the prefrontal cortex. These two theories of schizophrenia pathogenesis can be united by their involvement of the prefrontal cortex, where structural abnormalities could lead to neurochemical abnormalities. Accordingly, any gene expressed in the prefrontal cortex of developing brains is a functional candidate for schizophrenia. We have previously reported strong linkage to 15q15 (LOD=3. 57; P=2.6×10−5) in a collection of German multiplex families segregating the periodic catatonia subtype of schizophrenia in a nearly Mendelian fashion. A gene within our 15q15 linkage region, DLL4, is expressed in developing forebrain and produces a NOTCH4 ligand. Variants of NOTCH4 are associated with schizophrenia, thus DLL4 is both a functional as well as a positional candidate for schizophrenia. We screened this gene for mutations in three affected individuals and two unrelated controls and found two previously unreported SNPs, one non-synonymous polymorphism that changed an arganine to a histadine in Exon 7 and one synonymous polymorphism in exons. The non-synonymous SNP is a rare variant in that it was not found in 100 control chromosomes; however, it did not cosegregate with the disease in the extended family so it is not causative in this pedigree. It is unlikely that mutations in DLL4 are causative in this collection of families with linkage to 15q15.
Data Mining Whole-Genome Expression Profiling S Lal, S Dobin, D Stephan Arizona State University
DP McKeane, J Meyer, SE Dobrin, KM Melmed, S Ekawardhani, NA Tracy, KP Lesch, DA Stephan
Mapping of sudden infant death with dysgenesis of the testes syndrome (SIDDT) by a SNP genome scan and identification of TSPYL loss of function
Proceedings of the National Academy of Sciences 101 (32), 11689-11694
Abstract: We have identified a lethal phenotype characterized by sudden infant death (from cardiac and respiratory arrest) with dysgenesis of the testes in males [Online Mendelian Inheritance in Man (OMIM) accession no. 608800]. Twenty-one affected individuals with this autosomal recessive syndrome were ascertained in nine separate sibships among the Old Order Amish. High-density single-nucleotide polymorphism (SNP) genotyping arrays containing 11,555 single-nucleotide polymorphisms evenly distributed across the human genome were used to map the disease locus. A genome-wide autozygosity scan localized the disease gene to a 3.6-Mb interval on chromosome 6q22.1-q22.31. This interval contained 27 genes, including two testis-specific Y-like genes (TSPYL and TSPYL4) of unknown function. Sequence analysis of the TSPYL gene in affected individuals identified a homozygous frameshift mutation (457_458insG) at codon 153, resulting in truncation of translation at codon 169. Truncation leads to loss of a peptide domain with strong homology to the nucleosome assembly protein family. GFP-fusion expression constructs were constructed and illustrated loss of nuclear localization of truncated TSPYL, suggesting loss of a nuclear localization patch in addition to loss of the nucleosome assembly domain. These results shed light on the pathogenesis of a disorder of sexual differentiation and brainstem-mediated sudden death, as well as give insight into a mechanism of transcriptional regulation
Erik G Puffenberger, Diane Hu-Lince, Jennifer M Parod, David W Craig, Seth E Dobrin, Andrew R Conway, Elizabeth A Donarum, Kevin A Strauss, Travis Dunckley, Javier F Cardenas, Kara R Melmed, Courtney A Wright, Winnie Liang, Phillip Stafford, C Robert Flynn, D Holmes Morton, Dietrich A Stephan