Mapping Academic Values in the Disciplines
A Corpus-Based Approach
Series:
Davide Simone Giannoni
4. Methodology 73
Extract
73 4. Methodology 4.1. Corpus data The files assembled as described in the previous chapter were grouped by domain and uploaded to a well-known concordancing application – WordSmith Tools, hence WST (Scott 2006) – for textual data extraction. After generating a wordlist (WL) for each set of texts it was possible to access a number of quantitative parameters (as listed in the Statistics window) across the corpus, which totals almost one million words. Detailed data for each text are listed in Appendix 2, while the table below shows how such parameters vary across disciplines. Tokens WL tokens Types TTR STTR Sentence length ANTH 85,958 81,547 7,004 8.59 36.42 23.52 BIO 79,891 74,199 6,495 8.75 35.52 26.49 CS 138,038 133,408 8,244 6.18 37.20 23.87 ECO 106,370 103,399 7,220 6.98 36.63 24.28 ENG 69,885 65,706 5,765 8.77 34.28 26.44 HIST 106,393 105,399 11,757 11.15 45.93 25.55 MATH 171,872 155,407 4,925 3.17 24.71 16.21 MED 39,973 36,846 3,837 10.41 33.29 30.18 PHY 63,032 59,325 3,646 6.15 31.77 26.77 SOC 124,773 121,311 9,513 7.84 41.26 27.51 Overall 986,185 936,547 33,073 3.53 36.10 23.32 Table 1. Corpus data by discipline. The first column gives the total number of tokens found in each 10- text set. This is followed by a smaller figure, showing the number of tokens actually used in the WL to calculate...
You are not authenticated to view the full text of this chapter or article.
This site requires a subscription or purchase to access the full text of books or journals.
Do you have any questions? Contact us.
Or login to access all content.