Show Less
Full access

Rethinking East-Central Europe: family systems and co-residence in the Polish-Lithuanian Commonwealth

Volume 1: Contexts and analyses – Volume 2: Data quality assessments, documentation, and bibliography


Mikołaj Szołtysek

This book reconstructs fundamental aspects of family organization across historical Poland-Lithuania, one of the largest political entities in early modern Europe. Using a plethora of quantitative measurements and demographic microsimulation, the author captures and elucidates the complex patterns of leaving home and life-cycle service, marriage and household formation, along with domestic group structures and living arrangements among different subpopulations of Poland-Lithuania, highlighting a variety of ways in which these patterns were nested in their respective local and regional contexts. By showing that at the end of the 18 th century at least three distinct family systems existed in the Polish-Lithuanian territories, Szołtysek challenges a number of orthodoxies in the ‘master narratives’ on the European geography of family forms of F. Le Play, J. Hajnal, P. Laslett, and their followers. Volume two of the book contains an extensive bibliography along with a thorough archival documentation of the census-like microdata used in the book, and provides detailed information on their quality and further technicalities pertaining to data analysis.
Show Summary Details
Full access

1. Appendix 1: Data quality assessment

← 800 | 801 →

1.  Appendix 1: Data quality assessment

‘Immaculate materials are non-existent even in contemporary statistics’

(Witold Kula 1951, 96)

1.1  Introduction

More than in many other sectors of demography, researchers studying past populations (and especially populations of the so-called ‘prestatistical age’)1 need to use data that are often very rough, imprecise, or fragmentary. This problem has led demographic historians to pay special attention to the tasks of data assessment and checking. These practices have become so important that they are now considered ‘the cornerstone of research in historical demography’ (Henry 1968a; Hollingsworth 1968; Dupâquier 1974, 9; Del Panta et al. 2006, 597–598).

Indeed, many experts on early modern statistical materials would be inclined to admit that the ‘18th century is still a sheer jungle of uncertainties and traps,’ and that ‘the statistical materials of the feudal era differ substantially from those of later ages in that they were collected haphazardly and analysed without sill; as a result, they usually embrace just part of the phenomenon which they refer to, which makes them incomplete’ (Kula 1951, 96; similarly Gieysztorowa 1971, 558–561). In light of these problems, experts have warned scholars ‘to be constantly on the alert’ when dealing with early modern statistics, and to evaluate sources very carefully (Gieysztorowa 1971).

Therefore, it seems naive to expect that the normative reality recounted in the historical Polish-Lithuanian listings described in Ch. 2 would be fully mirrored in the surviving statistical materials. After all, population registration and records of the composition of residential groups were not meticulously kept at that time. Instead, the accuracy of ← 801 | 802 → these records varied depending on the individual predispositions and inclinations of the priests and estate managers responsible for maintaining them, as well as on the attitudes of the respondents themselves, many of whom were illiterate, and who may not have been always keen to disclose who was living with whom in their houses, huts, or hearths (Kuklo 2009, 70–71; Gieysztorowa 1976, 123 ff., 139). As a result, the problems that led to omissions and misreports – e.g., faulty census administration, low levels of education, inaccessible places of residence, fears about confidentiality, and extended enumeration periods – must have been much more severe in the 18th century than they were in modern enumerations (Plakans 1984a, 24 ff.; cf. also Ruggles and Brower 2003). Therefore, we cannot assume that early population listings were absolutely accurate in terms of either the population sizes they captured or the internal demographic structure they described2.

In the following sections, we examine the quality of the enumerations in census microdata from historical Poland-Lithuania, describing in detail their limitations and assessing their four distinct drawbacks inherent in them: (1) lack of internal consistency of enumeration schedules, (2) missing information on individuals’ characteristics, (3) underenumeration, and (4) misreporting. Our attempts at determining the nature and extent of these errors are guided by the pragmatic question of to what extent these data deficiencies distort the picture, which was presented in preceding chapters, of late 18th-century co-residence patterns in the vast territories of the Commonwealth.

However, it should be emphasized that not all of the methods commonly used to estimate undercounts in historical and contemporary census schedules (United Nations 1952; 1955; Himes and Clogg 1992; Hobbs 2008) were applicable in our case. First, we had only very limited opportunities to check our data against independent sources. At the same time, a case-by-case cross-identification of enumerated persons and their demographic characteristics in independent sources was largely impossible. Even if parochial vital statistics could have been found for all of the locations under investigation, verifying the personal data of nearly 160.000 individuals would have been beyond the capabilities not just one, ← 802 | 803 → but of several people. Thus, for the most part, the problems considered in the following sections are those of evaluating the accuracy of single census counts without reference to demographic data, other than data from the censuses themselves.

Before proceeding, it should be pointed out that most of the micro-censuses used here clearly did not fulfill the simultaneity condition typical of contemporary listings; i.e., they were not taken on a precisely determined day (Thorvaldsen 2006). The Status Animarum books, as well as the communion books, often took up to several months to prepare, even though officially the registers were supposed to be compiled during the Christmas holiday period (Kumor 1967; cf. Ch. 2). On the other hand, while the 5th Revision in the European part of Russia was supposed to have been taken place between June 23, 1794 and the beginning of January 1796, the actual enumeration process was subject to various local factors (Dèn 1902, 105–106; Kabuzan 1963, 120). The majority of the Civil-Military Commissions’ listings were prepared between January and December of a given year, although the actual duration of the census-taking varied considerably3.

1.2  Lack of internal consistency of enumeration schedules, and the lodging problem

According to Van de Walle (2006, xxxii ff.), there are three main issues which commonly crop up when trying to capture the household in a survey or a census: the boundary problem, the headship problem, and the membership problem. As residence units are most often clearly indicated in our data by explicit reference to separate houses (dymy; i.e., hearths), the first of the issues does not seem to be problematic in our case (see Ch. 2.8). The problem of headship can only be partly pertinent to us. The head (or the ‘owner’ or ‘tenant’) of the house is commonly clearly indicated as the first person on the list, independent of age or gender, ← 803 | 804 → and he or she is the person to whom all of the inhabitants of the house are linked, with clear depiction of their relationships. Moreover, the instructions – at least those for the Commissions’ censuses and Status Animarum – made it clear that the first person on the list should normally be the pater familias, or the household member to whom the greatest degree of control was vested, and not just a symbolic figurehead. Although we used a rigorous preselection process to eliminate micro-censuses with clearly inadequate information or an improper delineation of residential units, the final data assemblage is not entirely free of certain types of inconsistencies. Accordingly, in the following our focus is on the third problem anticipated by Van de Walle, even though owing to the structure of our data it is pertinent primarily with regard to one particular type of residential members, i.e., lodgers.

In most of the listings, there were blocks of names representing domestic groups in which all of the individuals were identified by their surnames and their household position, and their kinship ties to each other were clearly depicted (Ch. 2). Unfortunately, in some of the listings the authors were insufficiently meticulous in recording the internal relationship ‘architecture’ of domestic groups. They either could not neatly translate the intra-domestic relationships into one of the coding schemes available to them, or they had to cope with having only limited, imprecise, and perhaps even inaccurate information from the local respondents. Particularly challenging in this regard were cases in which the blocks of names of individuals did not make clear what their positions were within the residence group; e.g., who was a child of the household head, or to which subunit of a domestic group a given servant belonged4. Nevertheless, the majority of ambivalent cases could be definitively resolved by a meticulous reconstruction of the ‘language’ of a given listing, and by understanding the mechanisms governing the registration of persons.

A factor which additionally complicated the precise ranking of individuals within their domestic groups was the highly complex structure of many residential communes, which sometimes consisted of multiple inhabitants bound by complicated kinship ties. A set of listings from the Chełmska land (region 8) stand out for the meticulousness with which ← 804 | 805 → they were compiled. In most Belarusian revision lists, there was also considerable attention to detail in the depiction of relationships between individuals, even within residential groups made up of a dozen or more inhabitants. This last source, however, presented a different set of difficulties: namely, inmates and lodgers without independent households were listed with their family (if they had one) at the end of each survey. Since there was no indication of with whom these individuals lived in the village, they had to be assigned at random to the existing householding families in the village5.

In several separate listings, largely from the eastern Ruthenian regions (8–10), the enumerations sometimes recorded the existence of individuals not related to the head’s family, but they did not identify these individuals as either inmates/lodgers or servants. In these situations, we had to make several assumptions based on age and marital status in order to assign these individuals to one of two categories of domestics. While unmarried individuals under age 25 were treated as providers of domestic labor (servants), older people who were co-residing with their families were assigned to a separate category. In an analysis of household composition, these subtenants were included in the category labeled ‘inmates,’ along with representatives of komornicy, Hausleute, Inwohnern, bobyli, poturznicy, podsusiedki, and spólnicy known from other regions. It is important to bear in mind, however, that in some Ruthenian areas, subtenants could have rather specific property relationships within a domestic group6. ← 805 | 806 →

The issue of the lodging population is important from a methodological perspective. Some lodgers formed families and had offspring, and sometimes even hired servants; others, frequently advanced in age, lived alone with householders (see Ch. 10). It has recently been argued that while this latter category of individuals should be seen as bound by some form of economic cooperation or consumption community that binds domestic groups with householders, lodgers with families should be treated as separate production and consumption units, or as independent households (Kuklo 2009, 155; Janczak 2000, 126; Orzechowski 1956, 121). While ← 806 | 807 → conceptually reasonable (e.g., Kochanowicz 1981, 75), this distinction remains tentative,7 and cannot be tested against the historical census records that are available to us. The prime unit of observation in the census microdata discussed here was a co-residential entity formed by a group of people sharing living quarters in the rural countryside (without necessarily sharing all household activities). Accordingly, we treated all of the lodgers attached to the group of people inhabiting a house or hearth as members of a joint co-residential unit. This designation appears to be in line with many statements by the census-takers, who noted the residential integration of lodgers to the primary domestic units to which they were attached8.

1.3  Missing information on individuals’ characteristics

One of the pervading problems of historical population surveys is missing, partial, or incomplete information on individual characteristics, such as ← 807 | 808 → age, marital status, social category, and relationships to other co-resident individuals. As we have emphasized in previous sections of this chapter, the extent to which local record-keepers complied with the formal rules for conducting enumerations varied considerably.

1.3.1  Age non-reporting

Age is one of the most important variables in demographic analysis, as many demographic indicators are based on age (Ewbank 1981). Fortunately, the age was recorded for 95 percent of all of the individuals included in the present data collection (i.e., for 147.953 persons). Individuals for whom this information was not provided (7.865 individuals) were mainly concentrated in 13 parishes/estates spread across five regions (Table 1)9. For the remaining 221 local surveys, the shares of the population for whom there was no information on age were, on average, less than half of one percent. Only four out of 12 regional groupings had rates of age non-reporting that were above 2.5 percent. The only area in which this was a serious problem was region 9 in Red Ruthenia, where information on age was missing for nearly one-third of the population. The three remaining groupings (regions 2, 6, and 7) had much lower rates of age non-reporting. Problems with age reporting were generally more widespread in the listings from territories west of Hajnal’s line. Cases of missing age information in these western parishes accounted for almost two-thirds of the people for whom there was no age indicator. Meanwhile, four major regions in the east – despite having an overall population that was larger than that of the western cluster – had only a 14 percent share10. These east-west disparities are attributable to the more diverse set of sources used for the western territories, as the information that was recorded in these sources varied. Thus, these differences cannot be seen as a reflection of spatial ← 808 | 809 → variation in individual response rates or in the competence of the census-takers (see further sections).

Table 1: Spatial distribution of age non-reporting by regions of Poland-Lithuania.
RegionsTotal observations (=100%)% with age missing
Region 12.5430.5
Region 25.76318.1
Region 313.3200.4
Region 48.3580.1
Region 59.9450.3
Region 614.37116.3
Region 712.26512.9
Region 825.1932.4
Region 95.52630.3
Region 1014.0260.3
Region 11N19.1762.2
Region 11S25.3320.2

Source: CEURFAMFORM Database.

Includes enumerations in which age was entirely omitted in the survey.

Both within and between enumerations which normally provided information on age the levels of age non-reporting varied between the domestic group categories (Table 2). Groupings 6 and 9 were particularly problematic. In those listings ages were not given for almost 50 percent of the individuals in some categories, mostly household heads and their wives. In region 9, age non-reporting was widespread in almost all categories of inhabitants with the exception of offspring. In the remaining 10 regions the quality of the data was much better, as the shares of the population recorded with missing ages seldom amounted to more than two percent. In these listings, no trend toward a higher rate of non-reporting for particular residential subgroups cound be found. Two regions did, however, stand out: region 2, which had significant numbers of age ommissions in most categories; and region 7, which had a distinctly lower age registration rate among non-relatives than among other domestic group members. For the complete collection, however, the ← 809 | 810 → rate of age non-reporting never went above 10 percent within a particular group of household members. Interestingly, in the present collection age non-reporting did not appear to be more prevalent for servants and lodgers than for other domestic group members.

The patterns of age non-reporting found do not represent serious impediments to the analysis of domestic group structures and individual living arrangements (see, however, later sections on age heaping)11. As age non-reporting was widespread in only a small group of parishes located in a few regions, the issue affected only a small share of our total observations. Even after the most problematic localities were excluded from our calculations, we still had sufficient data to conduct a comprehensive statistical analysis which took into account age at the regional level, including calculations of the singulate mean age at marriage (Hajnal 1953). For region 9, the only region in which the listings were seriously distorted by deficiencies in age registration, we excluded most of the records from a range of analyses which relied upon the age variable. For analyses that focused on other aspects of living arrangements, were were able to conduct research using the more flawed parts of the data collection. ← 810 | 811 →


Table 2: Share of age non-reporting by category of domestic groups members; by regions of Poland-Lithuania.


Source: CEURFAMFORM Database.

Excludes 5 enumerations which entirely omitted ages in the survey.

Note that the ranking of region 7 has changed substantially due to the elimination of surveys which did not include information on age. ← 811 | 812 →

1.3.2  Marital status omissions

Distortions in information on marital status were dealt with in a similar manner. Overall, these distortions appear to have been relatively minor (see Table 3). The marital status was either not stated explicitly in the listings or could not be inferred from family circumstances at the census-taking for just 3.8 percent of all of the individuals registered in the database (5.874 persons)12. The share of people for whom there was no information on marital status was consistently below seven percent across all 12 of the regions, except for one. The rates of non-reporting were generally higher in the western than in the eastern territories. The rate fluctuated at around four to six percent in the west, and it was twice as high in Greater Poland (region 2). By contrast, none of the Polish-Lithuanian regions in the east had a rate of marital status non-reporting higher than 3.1 percent.

There were no clear signs of gender-selective distortions in marital status omissions. The rates of non-reporting were low among both men and women (in total 4.1 percent among females and 3.5 percent among males), and females were only slightly overrepresented among those with a non-specified marital status (53.5 percent among women compared to 46.5 percent among men). This gender balance diminished, however, when omissions for males and females were inspected over the life span. The patterns diverged quite early in life course due to the high proportion of young males with unspecified characteristics13. However, whereas the rates of non-reporting among the men fell to between two and six percent after the age of 34, the share of women for whom the marital status was not reported increased systematically with each consecutive age group, reaching 15 percent among the eldest women. ← 812 | 813 →


Table 3: Marital status non-reporting by regions of Poland-Lithuania.


Source: CEURFAMFORM Database. ← 813 | 814 →

More clues regarding the selective nature of marital status non-reporting came from an inspection of different categories of co-resident people. In the west (regions 1–7), around 80 percent of all of the cases of marital status non-reporting among men involved marginal members of domestic units, such as servants and lodgers14. Around 40 percent of the women in these two groups had an unspecified marital status. Among men the pattern was different: marital status was unspecified for 50 percent of male servants, but for only 10 percent of lodgers. Female heads and other kin in the west were also strongly affected by non-registration (30 and 55 percent, respectively), although these individuals constituted only a small share of the overall number of incomplete records.

The distribution of incomplete observations among the various categories of household members was different in the east. In addition to servants and lodgers, relatives of household heads were also likely to have been affected by non-reporting, but this compositional shift was primarily due to the fact that there were more people in the latter category in the east than in the west. Still, marital status non-reporting was most widespread among members of marginal domestic groups, with servants having been more affected than lodgers, and women having been more affected than men (50 and 20 percent, respectively, within each category). It should also be noted that the marital status was missing for only around 10 percent of the relatives of household heads in the eastern regions.

The situation described above has further implications. First, the inconsistencies in the registration of marital status had only very modest consequences for our analysis of living arrangements. Since the basic principles of the reconstruction of residence patterns were based on the concept of the conjugal family unit (CFU) and not of the marital unit (see section 2.11 in vol. 1), all of the individuals, irrespective of their marital circumstances, could be assigned to a living arrangement category, provided we had information on whether they were living with a spouse or with children, or whether they were living alone. The application of the ‘dyadic approach’ to the study of co-residence patterns makes the problem of the distorted registration of marital status less relevant. ← 814 | 815 →

The situation became more challenging when we started studying the entry into marriage, and especially when we began calculating the singulate mean age at marriage, a widely used indirect measure of the mean age at marriage. Several conservative assumptions needed to be made before we could proceed with this analysis, all of which are discussed in detail in the analytical Ch. 8. Here we only discuss the extent to which these different assumptions affected the precision of our estimates.

The computation of the singulate mean age at marriage required us to extract or calculate from the census the age-specific proportions of people who were single (‘never-married’), using the seven quinquennial age groups 15–54 (Schürer 1989). It was in this very preparatory stage that we were most likely to encounter difficulties, as we attempted to impute the definite marital conditions for individuals for whom the marital status was sometimes not precisely registered. Inconsistencies in marital status registration in our data left us no other option but to assign the individuals with incomplete characteristics to various hypothetical categories and then to carefully inspect the outcomes yielded by those different allocation rules. The proportions of never-married people were estimated in four different ways (Figure 1). First, only those individuals who were mentioned in the surveys as being bachelors or spinsters were treated as never-married, while those for whom the marital status was not provided in the sources were treated as currently married or widowed (‘ever-married’). These assumptions yielded estimates which should have represented the minimum proportions of never-married individuals in the different regions (see scores for ‘Minimum’). Another approach was used in order to reach the other end of the spectrum of possible results. In addition to counting those listed as bachelors and spinsters as never-married, we counted all of the individuals for whom the marital status was unknown, or who were living with no spouse but with children, as being never-married. This provided us with the highest theoretically possible rates of celibacy (‘Absolute maximum’ in Figure 1).

Both of these propositions are quite unrealistic, as it cannot be assumed that the individuals who had an unknown marital status were either all married (or widowed) or all celibate. In order to account for these ambiguities, two intermediate categories were added. In the first (‘Low medium’) category, all of the individuals without an indication of marital status were counted as ever-married, except for servants, who were ← 815 | 816 → assumed to be unmarried unless it was otherwise stated in the census. In the second category, all of the individuals without an indication of marital status were counted as never-married (this would include servants), except for the individuals who were living with no spouse but with children, who were treated as ever-married (‘High medium’).


Figure 1: Never-married by different allocation rules, by regions of Poland-Lithuania.

Source: CEURFAMFORM Database.

As expected, the shares of never-married people generated by the first two extreme assumptions differed significantly by five to 10 percent, particularly in the regions of western Poland-Lithuania. Regions 2 and 6 were the most affected by these contrasting definitions. The two intermediate categories appear to have generated more realistic estimates. The first approach generated a value in the middle of these two extremes, especially for the majority of the western regions. The values produced by the second approach were closer to the upper limits of the estimates of never-married individuals. In all of the eastern datasets except for region 9, using various ways to impute marital conditions to individuals with an unknown marital status yielded no significant differences in the final estimate, and in region 11S the different calculations produced almost the same results. ← 816 | 817 →

Our estimates of the singulate mean age at marriage also produced different estimates of the shares of never-married individuals. These estimates are described in greater detail in Ch. 8, where we present and discuss our results, and attempt to explain why some assumptions were favored over others.

1.3.3  The omission of surnames

The incomplete registration of individuals’ surnames in some of the censuses in our collection is yet another form of non-reporting which should be mentioned. When using historical census microdata, the surnames may be needed in order to fully reconstruct the kinship and family relationships between co-resident individuals (Ruggles 1995). In this section, we look at the possible implications of these biases for the analysis of family structures and living arrangements.

The non-reporting of last names primarily affected certain categories of household members, such as servants and lodgers. The omission of the surnames of some servants and lodgers may have led to a blurring of kinship relations within some domestic groups. This tended to occur especially in cases in which the kinship and/or affinity links with the head’s family coincided with the specific position an individual occupied within a particular domestic unit (e.g., a farmhand, a lodger, or a retired farmer). Such a situation may have led the record-keeper to define an individual’s position in the household based on one type of relational terminology rather than another (see Cooper and Donald 1995). If this was indeed the case, we may assume that in situations in which the source preferred to determine an individual’s economic position rather than his/her potential kinship relations knowing the surnames of the marginal household members would be critical for capturing unmistakably the composition of the domestic group15. Generally, we can assume that a person who may have ← 817 | 818 → looked like a non-kin servant or a lodger in the census might have turned out to have been a related co-resident if a more complex examination using multiple sources of information been conducted (Cooper and Donald 1995). It could therefore be argued that the omission of the surnames of some domestic group members (i.e., lodgers) may artificially decrease the rates of co-residence among kin in the analysis. Therefore, there is an expectation of a negative correlation between the rates of co-residence with inmates and with wider kin.

In some of the listings under investigation here, domestic group members were, indeed, identified using several relational terminologies simultaneously, like those based on household position or on kinship. Records of this type could be detected in enumerations from both the western and the eastern regions (e.g., in region 8). In some cases sons and daughters were identified as farmhands; and in other cases brothers, married daughters and their spouses, and even the parents of the head of a domestic group appeared as komornicy (lodgers)16. In most cases, however, houseful members were defined based on their kinship links. On the other hand, an analysis of cases in which functional criteria (komornik, inmate, etc.) were used along with the application of both antroponyms in the characteristics of individuals, indicated that only a very small share of those individuals represented distant relatives of the heads or of their parents17. In the much smaller number of cases in which servants were identified by surnames, convergences of this kind seldom occurred. ← 818 | 819 →

One seemingly intractable problem that arose in this context was that without a reconstruction of a wider kinship network between co-residents through the use of genealogical information, the scale of potential biases cannot be precisely estimated. However, these issues could be addressed using a cruder approach relying on the assessment of the relationships between the shares of co-resident elderly lodgers and elderly kin in those listings of which we know that they have registered marginal household members in a consistent manner. If the degree of variation among kin mirrored that among the lodgers, and especially if there was a clear negative relationship between the proportions of lodgers and relatives in a given setting, we may assume that some members of the lodging population were actually relatives18. If it is known that substantial numbers of adult and elderly kin were registered as unspecified inmates and lodgers, then parishes in which they appeared in larger numbers should also have smaller shares of specified co-resident kin than parishes with a lower concentration of inmates.

In Figure 2, the mean number of lodgers per 100 domestic units was plotted against the respective number of kin in 69 parishes of western Poland-Lithuania. This scatterplot shows that the relationship between the two variables was not patterned in any detectable way. The regression line indicates that the relationship between the two variables is positive (that is, contrary to expectations), but the proportion of variability in the data that is accounted for by this simple linear model is trivial. Parishes with comparable ratios of inmates differed considerably in terms of the extent of kin presence in domestic groups, and it becomes clear that the number of lodgers in a given population cannot be used to predict the rate of co-resident kin in an overwhelming majority of our locations. ← 819 | 820 →


Figure 2: Scatterplot of the mean number of lodgers per 100 domestic groups against the respective number of kin; western areas of Poland-Lithuania by regions.

Source: CEURFAMFORM Database.

All lodgers and kin are included, regardless of age. Data for 69 parishes. Seventeen parishes were excluded from the pool of western regions due to incomplete registration of age or consistent non-registration of lodgers.

Given these observations, it is impossible to conclude that the relatives of householding families have never been ‘hidden’ among unspecified lodgers. What we can suggest, however, is that in our dataset the misreporting of kin as lodgers (if it happened) may have been only sporadic and random, rather than consistent and systematic19. The general assumption that this problem has only very minor implications for the measurement of household composition also arises from our belief that the activities of the record-keepers were grounded in a shared system of meaning in which the distinctions between relatives and strangers would have played a crucial role. ← 820 | 821 →

1.3.4  Unspecified kinship pointers

Some of the listings from the CEURFAMFORM collection did not distinguish between kinship through the male and female lines. Brothers, sisters, and parents of the householding couple, as well as more distant relatives, were sometimes recorded without further identifiers indicating whether they had a kinship connection to the head or his wife.

The first step that can be taken in dealing with this problem is to distinguish between the different types of kin relationships to the individuals leading the domestic units (familial links are here considered to be a subset of those relationships), and to decide which of these relationships were most prevalent among the co-resident group members. The completeness with which they were recorded should also be taken into account (Ruggles 1995). For children, grandchildren, and children-in-law, the principal ties with the relevant others were least ambiguous, as they all were always linked to both the head and his spouse20. They should, however, be distinguished from other types of kinfolk whose kinship position in the co-resident group was by definition unilateral: i.e., parents, siblings, siblings-in-law, and wider kin. The next move in this line of inquiry would be to ask how many individuals of this latter group were known to have had their kin relationships clearly defined (either through the head’s or his spouse’s line), and how many relationships were left unspecified. The relevant findings are summarized in Table 4. ← 821 | 822 →


Table 4: Relational terms of co-resident relatives by regions of Poland-Lithuania.


Source: CEURFAMFORM Database.

All co-resident relatives other than children, grandchildren, and children-in-law. ← 822 | 823 →

We see that in the aggregate over one-quarter of the relevant kinship links were left unspecified. Most of the western regions had a non-reporting rate of 40 percent or slightly higher, with the sole exceptions being Lesser Poland (6) and Silesia (7). However, there were regions in both the west and the east of the country in which more than half of all kin links were unspecified – a share that in some cases exceeded 70 percent. The data collected for the western region 4 and Podolia (region 9) appear to have been the most problematic in this regard. A distinction can be made between two parts of the eastern territories. Regions 8 through 10 (mostly in present-day Ukraine) had many more inconsistencies in the descriptions of kinship links between the domestic group members than the Belarusian regions. These differences are clearly attributable to the differences in the sources used in compiling the listings in these areas: the Russian revision lists in Belarus, and the Civil-Military Commissions or parochial lists of inhabitants in the Ukrainian areas.

These features do not represent insurmountable obstacles to conducting an efficient structural analysis of domestic groups. The very fact that there were particular kinship relations (e.g., the co-residence of a brother with a sister) was sufficient to assign a given co-resident group to one category or another in the typological system of domestic units. The task of reconstructing the structural complexity of some conjugal family units will, however, prove more daunting, and may lead to mistaken interpretations of some individuals’ living arrangements. The analytical difficulty we face in determining the scale of non-reporting of family interrelationship pointers lies in the need to assess how many of the unspecified links might involve the configuration of a kin co-residence structure which might have been altered if a more appropriate means of defining these relationships had been available.

The argument that the incompleteness of information about kinship links recorded in the data may bias our estimates of the distribution of different family types needs to be taken seriously. Consider, for example, a group consisting of the farm head’s family, to which the head’s unmarried brother is attached, along with another relative whose kinship characteristics are, however, limited to a simple designation as a ‘mother.’ Depending on whose parent the person is assumed to be, this residential group would consist of either one or two conjugal family units (a single mother may form a CFU of her own together with her son, who is a head’s brother), and its structural type would be assessed as being either extended or ← 823 | 824 → multiple. There is much less uncertainty in situations in which information is lacking about whose parent(s) the co-residing elderly people is/are. In a domestic group similar to the one described above, but with no brother present, the question of whether the woman is the mother of the head or his spouse does not present a dilemma in the analysis of domestic group structures or living arrangements21.

Two approaches have been adopted to evaluate the potential biases. First, 3.941 individuals with unspecified indication of kinship pointers (relationship either to household head or his spouse) were examined with respect to their living arrangements. In particular, we looked at the types of domestic groups they were living in, and who their co-residents were. The 2.339 residence units these people inhabited may be taken as representing the absolute maximum number of units which could be altered had the information necessary to properly establish intra-domestic relationships within them been available (8.8 percent of the total number).

Almost 20 percent of the domestic groups with no conjugal family (type 2 according to Hammel-Laslett scheme), and 15 percent of multiple-family units (type 5) contained at least one person with an unspecified kinship relationship pointer, but the highest concentration of the problematic cases was among the extended domestic groups (Hammel-Laslett type 4; 40 percent share within this group). Any potential changes within the first two categories would have had only a negligible effect on the overall distribution of houseful structures: only a few dozen of these ‘no family’ units could have been changed into simple family housefuls, while in the structural categories a tiny share of multiple-family units (Hammel-Laslett’s ‘fives’) would not have become even more complex according to the standard scheme. On the other hand, if a substantial number of extended domestic groups could be proven to have included second conjugal units built around previously unrecognized kinship links within them, then the number of multiple-family units might have increased in the collection by as much as one-fifth (from a 24 to a 29 percent share of the overall number of domestic groups), particularly in some regions22. ← 824 | 825 →

Upon closer examination, however, it turned out that in the overwhelming majority of uncertain cases among people in extended families (almost 90 percent) the respective relative was the only co-resident kin from outside the householders’ conjugal family. Among all of the extended residential groups, cases in which two or more co-resident kin linked through conjugal or parental ties formed an additional CFU were found for just 143 houses (0.5 percent of the entire collection of domestic units). Such a small number of cases would have been too small to have had any effect on the distribution of the different domestic group structures presented in subsequent chapters.

These results are in line with the outcomes of a more in-depth investigation which examined different types of dyads with missing kinship pointers in an attempt to assess the consequences of incomplete information for the analysis of houseful structure and individual living arrangements. The investigation looked at co-residence patterns involving grandparent(s) with parent(s), parent(s) with sibling(s) (and siblings- in-law), and sibling(s) (also siblings-in-law) with nephew(s) (all categories in relationship to a core family within domestic unit). All of the cases in which individuals involved in a given dyadic relationship either had the same direction of kinship link (e.g., co-resident mother and brother(s) of the head, or his spouse), or could not produce an additional conjugal/parental link within the domestic group (e.g., co-resident mother of the head, and sister(s) of the head’s wife) were considered unproblematic23. The investigation relied on different combinations of dyads for which the actual relationship between the people involved could not be reconstructed in full, but for which the existence of a parental link could be hypothesized (as in the example discussed above; and in the co-residence of the sister of either the head or the spouse with the nephew of the head).

In addition to engaging in this labor-intensive process of identifying relevant person-cases, we mapped them across structurally different types of domestic units. This showed that the imputation of parental links for individuals with imprecise relational terms would have led to a change ← 825 | 826 → in the houseful structure code only for a very small number of residential units. For example, out of 3.263 extended domestic groups in the collection, only 24 would have been reclassified as multiple-family units, and this change would have had no effect whatsoever on the relative shares of the various family types across the regional subsamples. The changes for multiple-family units would have been equally small: only 155 (2.4 percent) of these co-resident groups would have been reclassified from one specific type of multiple-family arrangement to another, and there would have been no relevant modifications in the major sub-classes of types of family structure in the aggregate. It is therefore clear that the incompleteness of the family interrelationship pointers inherent in our data has little effect on the distribution of the various family types.

1.3.5  Preferential reckoning of co-resident kin

In most of the standard residential censuses, the kinship terminology present in the source is usually uni-dimensional, which means that the relationships identified with descriptive terms are those between the members of the group and some important individual, such as the head (Plakans 1984a, 59, 143; Darroch 2000). Since in historical rural populations most of the household heads were men, it is likely that the relatives of the male heads in domestic groups were recorded more thoroughly than the relatives of their spouses. These potential failures of historical listings may be further exacerbated by the omission of the maiden name of the head’s wife, as this can complicate the process of fully identifying the kinship ties among cohabitants24. Theoretically, this differentiation in kinship reckoning in the survey may be most pronounced in agrarian societies where the status of women is usually one of formal dependency and hence structural subordination.

However, tracing signs of these potential biases in surveys of historical rural populations is not as straightforward as it may appear. In societies in which patrilocal household formation patterns dominated, the relatives of the head (usually male) would normally be overrepresented among the co-resident kin. This can be a substantial hindrance when the goal is to distinguish patterns dictated by the objective mechanisms of household ← 826 | 827 → recruitment of kin from patterns that arose from a preferential reckoning of relatives within the co-resident kin group in the survey. In this case, the best course of action is to check whether there is a particular pattern in the distribution of the heads’ and the spouses’ relatives in various regional sub-samples. Again, the starting point for approaching this problem lies in the data presented in Table 4 (above).

A clear discrepancy between the proportions of co-resident kin of heads and of their spouses can be found across all of the regional samples. In all 12 regions, there are fewer household members related to the head’s wife than to the head. With the exception of two groupings particularly affected by information deficiency (regions 4 and 9), this gap was less pronounced in the western than in the eastern regions. This may be because the patrilineal, patrilocal, and patriarchal practices found in Ruthenian rural communities were far less predominant in the west (see Ch. 3).

Is it possible that the spouses’ co-resident kin were ‘hidden’ among the relatives with unspecified interrelationship pointers? If we aggregate the data at the parish level, it appears that the variability in the proportion of the head’s spouse’s relatives is indeed negatively related to the variability of the proportion of kin with missing pointers (Pearson’s r = –.290; p>.001). Thus, when the proportion of relatives related to the head’s spouse was high, the proportion of unidentified kin tended to be low, and vice versa. Our assumption is therefore confirmed. This pattern could be observed in the group of parishes with the highest ratios of co-resident kin related to the wife of the head (35 percent and above), and where there were no unidentified kin. In the remaining parishes this pattern could also be detected, albeit to a lesser extent: around eight percent of the variability in the proportions of the spouse’s relatives was attributable to differences in the proportions of unknown kin (R2 = .084)25.

There are, however, grounds for supposing that the imbalance between the number of co-resident relatives of the head and of the head’s spouse across all regions was caused less by selective misreporting than by objective mechanisms of household membership recruitment which ← 827 | 828 → were very closely linked to domestic power relationships26. Significantly, the listings under investigation here showed inverse patterns of household management, depending on whether the household was run by men or by single women (mostly widows) (Table 5). Whereas in units headed by males there was a relative predominance of relatives of the male head of the domestic group (especially in the eastern regions), in female-headed households relatives from the female side clearly outnumbered those from the side of the deceased male spouse27. Again, the gaps were larger in the eastern regions.

Looking at Tables 4 and 5, we can clearly see that the kinship reckoning in our censuses – even if it is conditioned on the pervasive headship principle in registering the relationships – shows signs of interrelationship pointers on the female side. How complete this registration actually was is a question that cannot be unambiguously answered. The inter-parochial variability in registration appears to have been rather incidental, and thus should not be definitively linked to a lower degree of precision in the registration of the individual characteristics of females. ← 828 | 829 →


Table 5: Relational terms of co-resident relatives by sex of household head and regions of Poland-Lithuania.


Source: CEURFAMFORM Database.

The somewhat patterned divergence of the western regions from eastern regions suggests that there were clear differences between these two broad territorial and cultural clusters in terms of the density of kinship relationships in the domestic domain (cf. Ch. 1, section 1.4.4). However, any attempt to study kinship links at the domestic group level using these materials should be undertaken with caution, making allowances for possible biases (see Ch. 10). ← 829 | 830 →

1.4  Underenumeration

No scholar imagines that a census, even a modern one, can be taken as a complete listing of all members of the population; instead, they acknowledge that some categories of individuals will have been omitted. Scholars have to determine which categories of people might have been excluded, and who actually lived in each domestic unit by counting both the listed and the unlisted members. If underenumeration occurred at random, the problems of data analysis would be substantially reduced, but underenumeration is often selective (Van de Walle 1974, 25–26; Ewbank 1981; Steckel 1991, 581). The following analysis investigates whether there were tendencies to omit or over-represent certain categories of the population in the historical censuses of Poland-Lithuania28.

It is generally accepted that even contemporary population statistics can occasionally be biased due to social customs which may discourage the reporting of certain segments of the population in particular countries (United Nations 1955). It is equally understood that similar problems likely occurred in historical population listings, and were probably more serious than they are today. The reasons why an enumeration was prepared would have strongly influenced the enumerator’s data collection methods (e.g., Berkner 1975, 725; Gieysztorowa 1976, 111). Moreover, peasants may have had their own reasons for concealing the presence of certain ‘souls,’ especially young children (if they were wary of financial consequences related to a child’s baptism or possible burial), married offspring (if they were seeking to evade the state’s policy of splitting extended family collectives; see Szołtysek and Zuber 2009a), and adult sons (if they wanted to shield them from military conscription)29. The poor, the unskilled, and ← 830 | 831 → the landless also sometimes went uncounted, as did women in some patriarchal societies (Gruber and Pichler 2002, 354). Lodgers were particularly likely to be missed in enumerations, as they were a fluid element of society who were not always land-bound, and were not in all cases obliged to perform direct feudal duties. Finally, in territories with a high rate of mixed marriages between Roman Catholics and Uniates (as in the parishes from region 9), there is a strong and apparently justified suspicion that some Catholic authors omitted dissenting inmates when compiling the listings. These observations provide grounds for several a priori reasons to suspect that there were errors in the enumerations that affected certain categories of individuals in our listings (see also section 10.7 in Ch. 10).

The character and scope of underestimations in premodern population registers are usually inferred from an analysis of their structure according to age and sex (Willigan and Lynch 1982; Van de Walle 1974, 25 ff.; Gieysztorowa 1976, 91; Kuklo 2009, 130–142; also United Nations 1952; 1955; Hobbs 2008; Poston 2006). The deficiencies of historical population statistics tend to be most pronounced in these areas. Tests of the accuracy of age statistics are particularly relevant, not only because these data are of major importance for population estimates, but also because classification by age and sex is of fundamental importance in demographic analysis (Poston 2006, 19–25; Hobbs 2008, 125–126). Some of the effects of age and sex errors are readily apparent in the statistics, while others can best be observed in the regrouped data, and still others are only revealed by means of various indexes, or by referring to age schedules obtained from model populations (Ewbank 1981, 18). The examination of data from different points of view should increase the likelihood that they will be correctly interpreted. It was therefore necessary to apply several different tests to our own data. The purpose of the following sections is to examine the variation in the accuracy of the age-sex data according to region and type of the source, with the goal of highlighting the areas for which difficulties pertaining to the quality of age sex data must be taken into account in the analysis. ← 831 | 832 →

1.4.1  Proportions of minors and of the aged in population

We start by looking at two specific population subsets who are particularly susceptible to defective enumeration. The percentage of children (below age 15) in the overall population is a significant criterion used to assess the value of census materials. As children were usually registered with the least degree of accuracy, their actual share of the population was often underestimated (Hobbs 2008, 159). Similarly, an analysis of the share of the elderly people (aged 65 and older) according to sex in the overall population can also provide us with important information about the extent of potential deficiencies in registration30. In both cases, underregistration would have an impact on our estimations: in the first case by artificially decreasing our measures of mean household or family sizes and the relative size of the offspring (or sibship) group among all of the houseful members; in the second, by decreasing the pool of individuals whose presence in domestic groups might have represented an extension beyond the conjugal family core.

In Figure 3, the parish-level data for these two broad age groups are plotted against each other for the complete collection, whereas Table 6 provides regional means for both variables. In the scatterplot, indexes for the minors are largely concentrated within a small range of values of between 35 and 40 percent (52.4 percent of cases). In three-quarters of all of the parishes the share of people aged zero to 14 was above 35 percent, and in one-third of the parishes the share was 40 percent or higher (the 75th percentile was 41.2 percent). Three outlying parishes had shares of minors below 25 percent, and five parishes had proportions larger than 46 percent. The parish-level variability in the proportions of the elderly was even greater. However, some 70 percent of the parishes had shares of two percent or more, 45 percent had shares of three percent or more, and more than one-quarter had shares of four percent or more. Overall, 60 parishes out of 220 had proportions of elderly people below two percent, which ← 832 | 833 → seems to suggest that there were deficiencies in the registration of elderly people in some of our locations (see more on that below)31.


Figure 3: Scatterplot of parish (estate)-level proportions of minors against the respective proportions of elderly, all regions of Poland-Lithuania.

Source: CEURFAMFORM Database. Data for 50.918 minors and 3.934 elderly people.

The distribution of data in Figure 3 indicates that there is no clear dependence between the two variables (R2= .02). The parishes with high shares of young people had both lower and higher proportions of the elderly, whereas parishes with high shares of elderly people had widely differing shares of young people. The pair-wise associations between the two variables and the population size of the locality were equally irrelevant. No such dependencies were detected among parishes with smaller shares of ← 833 | 834 → elderly people, although parishes from the eastern regions were slightly more prevalent in this group.

Broken down into regions, the data from Figure 3 reveal no clear clustering in space (not presented). However, some areas displayed more internal variation in the observed values of the ‘below age 15’ index than others, which can be seen in some rather significant departures from the most common (six to 10 percent) values of the coefficient of variation (CV)32. This was the case for regions 3 and 7 in the west and for region 11S in the east, as the parochial data for these regions were much more heterogeneous than they were for other groupings (CV values of between 17 and 21 percent; region 11S also had most of the outlying cases shown in the scatterplot). Meanwhile, the parishes in Greater Poland (region 2) and northern Ukraine (region 10) – again, two antipodal territories of the Commonwealth – had the least degree of variation.

The proportions of the aged at the parish level are much less homogeneous within regional clusters. For the ‘65+’ variable, only two regions displayed values of the coefficient of variation below 40 percent (region 10 appears again to have been the most homogeneous). The highest data dispersion factor occurred this time in region 5, located on the southwestern edge of Poland-Lithuania (CV=72 percent).

Table 6: Percent share of minors and elderly by regions of Poland-Lithuania (sexes combined).
Region 135.83.5
Region 236.93.6
Region 337.53.1
Region 440.63.7
Region 537.62.2
Region 638.42.2
Region 738.7 (34)*2.9 (4.7)*
Region 837.82.4
← 834 | 835 →
Table 6: Continued.
Region 943.71.5
Region 1040.63.0
Region 11N38.03.5
Region 11S38.63.3
Altogether (freq.)50.918 (53.784)*3.934 (4.383)*

Source: CEURFAMFORM Database.

* Includes parishes with potential underregistration of youngsters.

Including both of the variables in the regional estimates (Table 6) enabled us to make use of the proportions for communities large enough to be safely considered immune to the random deviations which might have occurred in smaller populations, such as those represented in Figure 3. In the majority of the regions, the share of the population under age of 15 was around 40 percent, and was even higher in three cases. Regions situated in the extreme east (11N and 11S) and west (1–4) of Poland-Lithuania had shares of elderly people of three percent or above, while five groupings scattered between them had shares of below three percent. The share of elderly people did not significantly exceed the 3.5 percent threshold in any of the regions, and it was under two percent in only in one of the areas (which was, however, of little representative value). If, based on an earlier grouping, all of the regions were divided based on the seminal Hajnal’s line, we would see a nearly perfect equilibrium between the parameters in the western and the eastern regions (the shares of individuals under age 15 would have been around 38 percent, and the shares of the elderly would have been around three percent). Such a balance, however, may not inspire trust. Relative to the western regions, the eastern regions, and especially those in Belarus, had very high fertility and birth rates but roughly equivalent mortality rates until the end of the 19th century (Fogelson 1938; see also Ch. 5). This leads us to expect that the shares of young people would have been higher in the east. We will address this issue in more detail in later sections of this chapter.

The question arises as to how the above data should be interpreted. In historic populations, the age structure of the population was – except during special periods or intensive migratory movements – mainly shaped by crude birth rates. In societies in which the primary demographic ← 835 | 836 → characteristics were high fertility and a relatively short average life expectancy (features typical of natural fertility regimes), a high percentage of the population would have been young (Rowland 2003; Keilman 2010). However, nuptiality, which played an important role in determining European fertility before the late 19th century, significantly affected the composition and growth rates of a population (Wrigley and Schofield 1981; Coale and Treadway 1986, 47). Moreover, some differentials in life expectancy existed even among pre-transitional populations (Ediev and Gisser 2007).

The data presented in Figure 3 and Table 6 can be seen as the product of this complex interrelationship of vital events and nuptiality. Unfortunately, the question of how important each of the components of this interrelationship was has been only partially answered by scholars of historical Polish-Lithuanian populations, and remains a matter of dispute (Kuklo 2009). Thus, efforts to determine to what extent the data distributions presented above reflect actual population structures, and to what extent they reflect deficiencies in population recording, have not been fully successful. We will address this difficulty in our assessment of the reliability of the population age statistics in a number of ways, including comparing our estimates with estimates found elsewhere in the scholarship on the demographic conditions prevalent on Polish-Lithuanian lands, both before and during the demographic transition. Next, we will compare indices calculated from our material with comparable information from other extensive datasets available for rural populations of central and northern Europe from the period 1740–1846. Finally, the available exogenous datasets from various Polish lands, the chronology of which approximates that of most of our material, will be used as a reference point in comparative analyses.

In the Polish expert literature a set of assumptions was applied to test the credibility of the listings from the pre-statistical era. Starting from the premise that natural fertility and high mortality regimes prevailed on Polish-Lithuanian lands until the end of the 19th century, Gieysztorowa – along with many other scholars – assumed that people aged 0–14 should have made up over 40 percent of the total population, and that a smaller share could be seen as proof of the incompleteness of an enumeration (Gieysztorowa 1976, 133, 100–101; also Kuklo 2009, 133; Kędelski 1990, 72–73; Janczak 1975, 40–41; Rusiński 1970, 72–76; Borowski ← 836 | 837 → 1963, 82)33. She further posited that in an era with limited life spans the share of older people (65+) should not have exceeded three to 3.5 percent (Gieysztorowa 1976, 96–101). The assumption that having a share of young people that was 40 percent or higher was a criterion for assessing the quality of a listing registration stemmed from both theoretical premises and from the comparisons with later 19th-century aggregate census data from Polish lands34.

According to Gieysztorowa, in eight departments of the Duchy of Warsaw (covering a tiny part of the then non-existing Polish-Lithuanian Commonwealth) in 1810, the shares of minors were 40 percent, and the shares of elderly people (aged 65+) were 3.6 percent (Gieysztorowa 1976, 96) – parameters that closely resemble those of our data. In Szulc’s computations for the period less than a century later – and on the eve of the demographic transition – the share of young people was 39.3 percent on Polish lands, while the share of people aged 65+ was only 3.6 (Szulc 1936, tab. 13, 15–16; data for the years 1897–1900)35. Again, these numbers closely resemble our estimations (similarly Kuklo 1998, 48–49).

Rosset (1964, 233) assumed that the constant and systematic increase in the proportion of elderly people noted on Polish lands from the middle of the 19th century started from levels that only slightly exceeded the three-percent threshold. The general proportions of elderly people calculated from our materials do not, therefore, raise suspicion. Underenumeration ← 837 | 838 → may have occurred in four regions with an index level below three percent, though, following Rosset (1964, 210), it could be argued that a proportion of the elderly even below the two percent level is theoretically possible under ‘demographically primitive conditions.’ What is peculiar is that in reference to conditions prevalent in central Poland around the middle of the 19th century (within the borders of the Congress Kingdom) Rosset mentioned an ‘unusually low proportion of old people, which is a typical symptom of extreme demographic backwardness’ (Rosset 1964, 231).

The range of possibilities regarding the observed proportion of minors and elderly in historical populations can be illustrated with Table 7, in which the statistics obtained from our material are compared side by side with information from seven other datasets. The results obtained from our data clearly fall within the range of values in statistics from other regions, and they represent neither the lowest nor the highest values for both variables. The variation in our data compared to German territories appears to have been justified after we take into account the generally higher age at marriage in the German parishes, as well as the likely more favorable conditions in terms of predicted life expectancy (Szołtysek 2003, 124–155; Knodel 1988, 53–60, 132–133, 254–269)36.

Our assumption that the indicators for the share of children may have exceeded the values we found for Polish-Lithuanian lands, while the approximate proportions for older people were maintained, can be illustrated by the mid-19th-century statistics for the rural United States. It is important to bear in mind, however, that in the pioneering populations of rural North America the birth rates of whites were still well above 40 per thousand, even around 1850 (Coale and Zelnik 1963). The low minimal values for Polish-Lithuanian lands clearly point to the sporadic underregistration of children, although our collection lists relatively few parishes in which the share of young people did not exceed the 25 percent threshold. More alarming are the minimal (equaling zero) index values for elderly individuals, which have already been hinted at. Similar cases occurred, however, in virtually all of the listings included in the comparison, the majority of which probably had a higher level of population registration than ← 838 | 839 → our 18th-century data. Moreover, the maximum values registered in the Polish-Lithuanian material do not seem to be entirely erroneous. For the young people, the extreme values mirror those found for American young people. For the older people, the values were consistently lower than or comparable to data from nearly all other listings.

Still, it might be more conclusive to compare the CEURFAMFORM data with the statistics for Polish-Lithuanian lands inferred from independent sources which have a similar chronology and cover some of the same territory. This was attempted in Table 8. The table is, however, limited in that it includes only the proportions of individuals below the age of 15, and only for men37. In order to facilitate more precise comparisons, the territories under investigation from our material were further broken down into eastern and western regions, and region 10 was separated from the eastern regions (Zhytomir district in the Ukraine). In general spatial terms, the objects from the first group correspond to data from the Szadkowski and Kaliski districts (the Kaliski district neighboured in the north regions 4 and 5 from our database, while the Szadkowski district lay slightly to the northwest of region 6). The grouping ‘east’ partially corresponds to the regions of right-bank Ukraine, while region 10 overlaps with one of the districts investigated by Kędelski. ← 839 | 840 →


Table 7: Proportion of minors and elderly in various historical populations.


All values weighted by size of settlements (parish, villages, communes, counties).

For Poland-Lithuania: values in brackets refer to data including parishes with potential underregistration of young people.

Source: for Poland-Lithuania – CEURFAMFORM Database; for Westphalian parishes, Schleswig-Holstein, GCU and Wiener Datenbank – Mosaic Historical Data files; for Norway – data provided by A. Solli (Bergen); for the US (144 counties, rural population only) – 1% sample of the IPUMS-USA, Integrated Public Use Microdata Series: Version 5.0 [Machine-readable database], Minneapolis: University of Minnesota, 2010; for Great Britain – 437 parishes (rural population only); 2% cluster sample of parishes from the NAPP release (Minnesota Population Center. North Atlantic Population Project: Complete Count Microdata. Version 2.0 [Machine-readable database]. Minneapolis, MN: Minnesota Population Center, 2008; K. Schürer and M. Woollard, National Sample from the 1851 Census of Great Britain [computer file], Colchester, Essex: History Data Service, UK Data Archive [distributor], 2008). ← 840 | 841 →


Table 8: Proportion of minors in various populations of historical Poland-Lithuania (males only).


All cases weighted by size of male population in parishes.

Source: CEURFAMFORM Database; for Szadkowski district – Janczak 1975; for Kaliski district – Rusiński 1970; for right-bank Ukraine – Kędelski 1990.

The indices from our western parishes differ only slightly from the values calculated for the Szadkowski and Kaliski districts, and therefore do not raise any particular suspicions38. The comparison is likely to show our materials in a favorable light, especially considering that the census returns from 1789 by definition included the entire populations of particular villages and parishes – and thus also included the inhabitants of manors, presbyteries, and farmhands’ premises – whereas our data included only peasant family households. Still, this fact does not explain the much lower minimal values in our materials39. ← 841 | 842 →

The results of a comparison of our data for the eastern regions with the 18th-century Ukrainian statistics proved less favorable, although the more serious underestimations seem to have affected mainly the Belarusian regions. The percentages of boys below the age of 15 in region 10 were very similar to the shares found in both the aggregated Ukrainian data from 1789 and to the shares in the Zhytomir district from the same year (Kędelski 1990, 71–72), and to the minimal index values found in both independent sources. The seemingly better coverage of the listings from region 10 – compared to the listings from 11N and 11S – suggests that caution should be used in interpreting some of the estimates for the Belarusian regions, especially in the sections devoted to issues of mean household size (Ch. 10).

1.4.2  Proportion of infants in the population

The relatively minor departures of our statistics from the theoretically predicted values and from other auxiliary evidence do not preclude the possibility that our regional populations were subject to selective underenumeration. This feature of our data can be seen most clearly in Figures 5–16 presented in later sections, which testify to the ‘undercutting’ of age pyramids resulting from an underrepresentation of the ‘age zero generation.’ In most pyramids, a deficit of the youngest children can be easily discerned, while the expected tapering first appears among age groups three years or older (less frequently among age groups two years or older). Below these age groups, however, the pyramids appear to have eroded. However, determining the actual scale of this phenomenon requires us to assume an appropriate reference threshold. According to Polish historical demography, infants should not constitute less than four percent of the total population, while the combination of generations zero and one should generate values of between eight and 10 percent (Gieysztorowa 1976, 132; Kuklo 1998, 46). ← 842 | 843 →

Some scholars have posited that the listings of the Civil-Military Order Commissions may not have covered children under one year old (see Kuklo 1998, 44). As we can see in Table 9, however, this issue has yet to be definitively resolved. In most of the regions, including those in which the majority of listings came from the Commissions’ materials (regions 3–6, 8, and 10), the youngest generations were definitely undercounted, but the registration of infants was entirely abandoned in region 4 only.

The registration of the youngest children was more accurate in regions 1, 7, and 9, although even there it was twice as low as might be expected. Following the age zero generation, the generation of children under two years old – assuming a relatively equal birth rate in successive years – should have been less numerous owing to effects of infant mortality. However, substantial omissions of the youngest generation resulted in a lack of regularity in our materials. Particularly alarming is the gap between the percentage of one-year-olds and two-year-olds, with the latter group generally being larger. This seems to strongly suggest that the registration of one-year-olds was incomplete, and does not appear to indicate that the census-takers assigned these children to the following generation (cf. Kuklo 1998, 45–46). A rough projection of the figures from the last column of Table 9 downward from the three-year-old age group would suggest an underenumeration of approximately 80 percent of newborns, 16 percent of one-year-olds, and six percent of two-year-olds, with one-third of children in the zero to two range having been omitted from enumerations.

These deficiencies in the registration of infants invariably reduce the preciseness of our assumptions with regard to the mean size of domestic group. Thus, the estimations presented in Ch. 10 should be seen as minimal values. Any estimates implying the precise recording of a transition to the first child in the listings (proportions of young adult males/females/families living with at least one child) should also be viewed with caution, along with the distinctions made between the different types of nuclear families (those living with or without children). These are, however, relatively minor issues which do not represent serious impediments to the investigation of living arrangements and household structures presented in Ch. 1040. ← 843 | 844 →


Table 9: Percent share of youngest children by age, and regions of Poland-Lithuania.


Source: CEURFAMFORM Database. ← 844 | 845 →

1.4.3  The proportions of elderly people and the frequency of extended households

A much more serious issue is the estimation of the degree to which the observed variability in proportions of elderly people could determine the frequency of the occurrence of households with upward extensions from the head (Laslett’s type 4a and 5a). Variation in demographic conditions may have had a significant impact on indicators of family structure. In populations with a very small number of elderly people only a few households would have had the potential to include elderly kin, and hence to become extended in structure (Ruggles 2010; 2012; Gruber and Szołtysek 2012).


Figure 4: Scatterplot of the proportions of elderly people and the proportions of domestic units with upward extension (parish/estate-level); all regions of Poland-Lithuania.

Source: CEURFAMFORM Database. Data for 220 parishes/estates. Units with upward extension are those of Hammel-Laslett type 4a and 5a

For 220 parishes from the regions of Poland-Lithuania we investigated this issue using a simple linear regression model (Figure 4). In our case, the proportions of elderly people did not exhibit any significant linear ← 845 | 846 → dependence on the distribution of the shares of domestic units with upward extension (R2= .003), even after controlling for the size of locality. Thus, variability in the share of the aged population in our collection does not seem to be responsible for inter-parochial differences in extended family structures. These differences must, then, result from factors in addition to demographic variation.

1.5  Age heaping and digit preference

Age structures represent the starting point for any population study. Obtaining information on age structures and plotting it on a graph is often the first step in seeking to understand the nature of processes affecting populations. It also provides an essential guide to considering potential drawbacks and deficiencies in the census coverage. To this end, in the following section we will move beyond our previous focus on the youngest and the oldest population subsets, and look at the distribution of individuals across all ages in our regional data. An indispensable aid in achieving this goal is the visual representation of these statistics in the form of population pyramids.

An overall investigation of the age-sex structure according to the requirements of contemporary demographic statistics is in our case disrupted by a substantial accumulation of age values on particular numbers (as illustrated in Figures 5–16 below). This specific type of age misreporting, which is known as age heaping (the rounding of ages) constitutes ‘one of demography’s most frustrating problems’ (Ewbank 1981, 88). It represents an insidious obstacle in census enumeration because these digit preferences are difficult or even impossible to detect at an individual level (Steckel 1991, 581–82)41. ← 846 | 847 →

People with lower mental capacities who were not able to accurately determine their ages (and in practice had almost no recourse to written baptismal records), or who lacked numerical discipline, could only give a rough estimate of their age. In making such estimates they likely used the ages of close relatives – i.e., parents, siblings, or children – as points of reference (see Herlihy and Klapisch-Zuber 1985, 109; also Ewbank 1981, 5–17)42. As a result, the data on ages are rarely completely incoherent, although they often lack the level of precision that would satisfy a demographer. Depending on the circumstances, the ages might have been overestimated, inflated, or deflated. The extent to which these distortions occurred can be measured by means of age heaping indices, which measure the tendency in a population to round ages using certain digits. It is also widely assumed that digit preferences (particularly for zero and five) are likely to be linked to other sources of inaccuracy in age statements, and to a general lack of reliability of the age distribution (United Nations 1990, 20).

1.5.1  General patterns in digit preference

Figures 5 through 16 present the age-sex structure of the aggregate population of the CEURFAMFORM database, its two main geographical conglomerates, and pyramids for nine regions deemed illustrative of a range of possible patterns discernible in the entire collection43. ← 847 | 848 →

Almost all of these figures show that certain numbers in our listings had a powerful attraction. Even a cursory look at the pyramids tells us, however, that the selection of declared ages in the enumerations was not entirely arbitrary, and that rounding generally occurred in consistent patterns. Especially after the age of 20 (and in some regions even earlier), most of the pyramids show pronounced spikes at the decadal years and secondary spikes at ages ending in five. Single-year age groups one or two digits apart may show enormous variations in size. For both sexes, the most ‘crowded’ age was 30, followed by 40; although there was some regional variation in this pattern44.


Figure 5: The age and sex distribution of a population of Poland-Lithuania (entire collection). ← 848 | 849 →


Figure 6: The age and sex distribution of a population of western regions of Poland-Lithuania.


Figure 7: The age and sex distribution of a population of eastern regions of Poland-Lithuania. ← 849 | 850 →


Figure 8: The age and sex distribution of a population of Region 2 of Poland-Lithuania.


Figure 9: The age and sex distribution of a population of Region 3 of Poland-Lithuania. ← 850 | 851 →


Figure 10: The age and sex distribution of a population of Region 4 of Poland-Lithuania.


Figure 11: The age and sex distribution of a population of Region 5 of Poland-Lithuania. ← 851 | 852 →


Figure 12: The age and sex distribution of a population of Region 6 of Poland-Lithuania.


Figure 13: The age and sex distribution of a population of Region 7 of Poland-Lithuania. ← 852 | 853 →


Figure 14: The age and sex distribution of a population of Region 8 of Poland-Lithuania.


Figure 15: The age and sex distribution of a population of Region 10 of Poland-Lithuania. ← 853 | 854 →


Figure 16: The age and sex distribution of a population of Region 11S of Poland-Lithuania.

This stress on even ages persisted in older age groups, while the preference for reporting ages ending in a five and in other digits declined. A provisional inspection of the census figures shows no signs of a preferential reporting of ages with the even-numbered terminal digits two, four, six, and eight over those ending in one, three, seven, and nine; indeed, the degree of preferences was almost the same for both categories of digits45. In regions 8 and 10, and especially in region 7, preferences for certain digits appear to have been much less pronounced.

1.5.2  Digit preference on zero and five

A better test of the general reliability of our age statistics can be conducted by referring to the age heaping index, which measures the degree of preference for or avoidance for ages ending in zero and five (the so-called Whipple’s index)46. Index values are given for all regions in Table 10. ← 854 | 855 →

The United Nations has stated that if the values of Whipple’s index are less than 105, then the age distribution is deemed ‘highly accurate.’ If the index values oscillate between 105 and 109.9, the age distribution is considered ‘fairly accurate.’ Meanwhile, values of between 110 and 124.9 are deemed ‘approximate;’ values of between 125 and 174.9 are considered ‘rough;’ and values of 175 or higher are deemed ‘very rough’ (United Nations 1990, 18–19). In our case (column B in Table 10), only one region had an index value which is acceptable by modern demographic standards (region 7), though another region was close (region 10). These regions differed substantially in terms of their location and the type of census used. Region 7 had a mostly German-speaking population and kept records using the Protestant ‘lists of souls.’ In region 10, the records came from remnants of the Polish-Lithuanian census of 1791, and the population consisted mainly of Greek Catholics and Ukrainians. In all of the remaining regions, the Whipple’s index values were well above the level of 125 (which is the lower bound of poor age reporting quality), and the values were over 175 in most cases47. ← 855 | 856 →


Table 10: Measures of age heaping and the dispersion of age heaping across parishes/estates, by regions of Poland-Lithuania.


Source: CEURFAMFORM Database. Data for 220 parishes with 141.172 individual observations.

*CV was calculated from regional sample (parish/ estate) means and their respective standard deviations.

** The ABCC Index indicates the percentage of individuals with a correctly reported age (see A’Hearn et al. 2009, and below).

*** Data for only two parishes

**** Data for only one parish

If we regroup the data by treating Hajnal’s line as a demarcation marker, the east-west gradient in the degree of age heaping clearly appears. Overall, eastern parishes had a higher index than western parishes (236 to 180), but it is fundamentally an effect of disproportionately high index values in Belarusian regions (11N and 11S) and in Ukrainian Podolia (region 9). Two other regions with a predominantly Ruthenian population (regions 8 and 10) depart significantly from this pattern, exhibiting index values ← 856 | 857 → distinctly lower than those for western and southern Polish-Lithuanian lands. These two regions are also among the 19 parishes with the highest quality of age registration (the Whipple’s index of 100–124).

The extent of digit preference within the regions varied considerably between parishes (see columns E-F in Table 10). Region 7 and the two Belarusian regions displayed the lowest degree of variation in the quality of age reporting. For region 7, this lack of variation indicates that the quality of age registration was high across most of the region’s parishes. For the Belarusian regions, however, this lack of variation suggests that there were only very small inter-regional deviations from the overall tendency to misreport age.

An extension of the original Whipple’s index is provided with its linear transformation, which yields the share of individuals with a proper age record (the so-called ABCC Index; see A’Hearn et al. 2009, 788; Crayen and Baten 2010, 84)48. The general characteristics of our dataset indicate that the ages were correctly reported for 72 percent of individuals (Table 10, cl. G). This parameter varies from 40 percent in the Belarusian territories to 70–80 percent in the western regions. The 100 percent level was approached only in region 7, although it was also relatively high in regions 8 and 10 (for comparative data, see A’Hearn et al. 2009, 801).

Table 11 allows us to compare the age heaping indicators from pre-modern Poland-Lithuania with those of other European and non-European societies of the preindustrial and modern eras (sites are sorted chronologically within broad geographical regions). The overall quality of age reporting in our dataset does not compare favorably with the indicators calculated from state-administrated censuses from the Netherlands, the southern Danish provinces, and Norway; or with values inferred from nationwide censuses of the US and Great Britain one century later. However, figures from some regions of historical Poland-Lithuania approach these quality levels (regions 8 and 10). ← 857 | 858 →

Table 11: Age heaping indicators for Poland-Lithuania and other European and non-European societies.
Region/ locationWhipple’s index
Reims, 1422234
8 Westphalian parishes, 1751263
Southern Netherlands, 1796163
Rural Norway, 1801126
Schleswig-Holstein, 1803109
Rural US, 1850142
Rural Britain, 1851125
Tuscany (without Florence), 1427289
Poland, 1921135
Tula (central Russia), 1715–20257
Viatka (northeastern Russia), 1710–17342
Russia, 1897175
7 Latin American countries, 1744–1899247
Albania, 1918324
Iran, 1976163
Indonesia, 1980222
Afghanistan, 1979297

Source: for Poland-Lithuania – CEURFAMFORM Database; for Albania – data provided by S. Gruber (also Gruber 2001); for Reims, Tuscany and southern Netherlands – De Moor and Van Zanden 2010; for Westphalia and Schleswig-Holstein – Mosaic Historical Datafiles; for Norway – data from the census of Norway provided by A. Solli; rural Britain and the US – North Atlantic Population Project (1 percent samples); for Tula and Viatka – Kaiser and Engel 1993; for Poland 1921 – own calculations based on the First General Census in Poland; for 1897 Russia – Baten and Szołtysek 2012; for Latin America – Manzel et al. 2012; for Iran, Indonesia and Afghanistan – Jowett and Li 1992, 434.

Our data compare somewhat more favorably with data from the status animarum, which was conducted in the mid-18th century by the church administration of the Münster bishopric (Westphalia). In this case, the levels of age misreporting exceeded the levels recorded in all of the western regions of our sample, and in historical Poland-Lithuania overall. The consistency of age reporting in our material is also in line with that of listings from other parts of eastern and southern Europe and from historical Latin America. The Whipple’s index for our entire collection is smaller than it is in all of these locations, especially in comparison with 18th-century Russia, rural Tuscany, ← 858 | 859 → and the Balkans. This does not, however, apply to the age data from the Eastern Borderlands (regions 11N and 11S), as the quality of the data from these regions is worse than that of countries with very poor records of age registration, such as Russia, Albania, and contemporary Afghanistan.

The material included in Table 10 also provides some clues about the gender differentials in age heaping in our listings. The popular belief that men have higher levels of quantitative literacy (and thus a higher level of human capital) than women, as is assumed to be the case in traditional patriarchal societies49, is not supported by our data. Women did slightly worse than men on the Whipple’s index in only three out of the 12 regions, and even in the regions characterized by patrilocal joint-family organization (11N and 11S) women seem to have been less prone to age misreporting than men. The gender equality index in numeracy calculated for the entire Polish-Lithuanian collection yields a positive value of 6.1, which indicates that the quality of age reporting was higher among women than among men50.

It has been suggested that age rounding becomes more common as people age (A’Hearn et al. 2009; Crayen and Baten 2010, 93–96). Figure 17 displays the values of Whipple’s indices separately for males and females grouped in five 10-year age groups for the entire collection, and for the western and the eastern agglomerations separately51. Generally, less age heaping was observed among individuals aged 23–32. However, no clear tendency for index values to increase with age could be found in most of the aggregated data. The propensity to round off one’s age distinctly increased among females, but much less so among males. More ← 859 | 860 → importantly, the curves for the overall population appear to be the composites of two distinct and largely opposite tendencies in the western and the eastern parishes. In the western regions, the actual rise in the propensity for age rounding among the middle-aged populations of both sexes was counterbalanced by a subsequent fall in index values. This is surprising since older individuals should be more likely to forget their age, or to pay less attention to it (Crayen and Baten 2010). The old-age effect in heaping appears to have been stronger in the regions with higher levels of digit preference (eastern territories) among both men and women52. However, in all of the groupings presented in Figure 17, a reversal of the male and the female patterns of digit overselection can be seen. Age rounding was more prevalent among younger males than among younger females, while the opposite pattern can be observed among the older age groups.


Figure 17: Age- and sex-specific Whipple’s indices in western and eastern areas of Poland-Lithuania.

Source: CEURFAMFORM Database. ← 860 | 861 →

1.5.3  Preference for and avoidance of all digits of age

Whipple’s index is a very effective measure of age accuracy, but it has some drawbacks, including the fact that it can only be used to measure digit preference. Moreover, the sole focus of the index is on age heaping for the digits zero and five (United Nations 1955, 41). It has been shown that in some societies there is a decided preference for figures ending with digits other than zero and five, and that the avoidance of some numbers can occur in a patterned way as well (Stockwell 1966; Stockwell and Wicks 1974; Nagi et al. 1973; De Moor and Van Zanden 2010, 184; also Jowett and Li 1992).

The investigation of patterns of digit preference and avoidance is therefore a natural extension of the analysis of quantitative numeracy in populations. It is, however, primarily seen as a tool for the selection of optimal age groupings in the data (Myers’ 1940; Young 1900). This can be done through the application of Myers’ method of ‘blending’ to a single year of age census data53.

Table 12 (below) shows the preference for digits of age in the complete collection and in all of the regional groupings, together with the range and the pattern of deviations in the relevant indexes. In the most general terms, the summary indices of preference (at the bottom of the table) replicate the ranking of our regions that was already achieved through the Whipple’s ← 861 | 862 → measure. The Myers’ index of preference for the entire collection is 20.8, and is thus very high relative to the values found in both historical and contemporary developing countries54. Thus, for at least one-fifth of the entire population of our database, age was reported with an incorrect final digit, and this proportion increases to around 40 percent in some of the eastern regions.

The groupings with particularly low scores on the Whipple’s index (regions 9, 11N, and 11S) also had by far the worst values when the new indicator was applied. The new measure told more or less the same story for the majority of the western regions as well, except Silesia (region 7). Based on Table 10, we can see that of the three regions exhibiting a relatively high age registration quality (regions 7, 8, and 10), only the first one had a leading position when the additional measure had been taken into account55. ← 862 | 863 →


Table 12: Terminal digits deviations and indexes of preferences, by regions of Poland-Lithuania.


Source: CEURFAMFORM Database.

For each regional grouping: column 1 indicates population at each digit of age as a percentage of the total population; column 2 – terminal digit deviations from the 10 percent standard. ← 863 | 864 →


Figure 18: Myers’ terminal digit deviations in western and eastern areas of Poland-Lithuania.

Source: CEURFAMFORM Database.


Figure 19: Myers’ terminal digit deviations by sex, entire Poland-Lithuania.

Source: CEURFAMFORM Database. ← 864 | 865 →

In the majority of regions, substantial positive deviations appeared only for the terminal digits zero and five. For the digit zero, the proportions were as much as 100 percent or more above the theoretically expected proportions across most of the western regions. In the three eastern regions, these values were as much as three or four times higher than expected, indicating an enormous overselection of ages ending in zero, as was signaled by the Whipple’s indices. The age heaping at the digit five was much less pronounced, but it was still non-negligible, appearing in all but four regions of the sample. However, the digit five yielded a relative excess of 30 percent over the theoretically expected proportion in only two of the regions (11N and 11S). These patterns were accompanied by a substantial concentration on ages ending in one in two regions, and on ages ending in six in five other regions. The corresponding pattern of digit avoidance is more complex and puzzling. Still, some degree of avoidance was discernible (values below six percent) for the digits seven, one, three, and nine.

It may be concluded from Table 12 that in terms of digit preference, our aggregated population constitutes an amalgam of at times very distinct regional tendencies, although there is indeed a common trend for the majority of territories. When we look at the second and third columns of Table 16 in conjunction with Figures 18–19 above, we can speculate as to an ‘average’ pattern of digit preference and/or avoidance in our data. It is possible to discern congruence in the overall patterns of digit preference in two broad spatial entities, which – except for some differences for the digits five and six – differ only in magnitude, but not in the direction of heaping. An examination of Figures 18–19 suggests that the lion’s share of the concentration on ages ending in zero was drawn not from the nearest year of concentration, or even from the two digits on either side; but rather from several digits from above and below the most preferred digit. Accordingly, the digits from above, such as two, three, and four; and the digits from below, such as seven, eight, and nine; have been systematically avoided in the listings, making an almost equal contribution to the over-reporting of individuals at ages ending in zero56. This order of digit ← 865 | 866 → preference and avoidance was retained after the data were aggregated for the entire collection. These findings must be taken into account as we categorize ages (see below).

1.5.4  Who was rounding off their age, and why?

Who was rounding off their age, and why? A simple cross-tabulation of the percentage of individuals reporting ages ending in zero and multiples of five by geographic, demographic, residential, and source-related characteristics should help to shed some light on this question (Table 13). The regional panels of Table 13 are in line with our preceding explorations of the spatial patterns of age heaping, as they illustrate the exceptional nature of the household listings from regions 7, 8, and 10; as well as the general superiority of age reporting in parishes located in the western part of the country. Our previous observations are corroborated in yet another respect, as digit preference seems to have had a slightly more marked effect on the age returns of males than on those of females. Numbers from the age group panel in Table 13 seem to be quite revealing, as well, as they show a clear old-age effect in the tendency to heap ages.

The statistics presented in Table 13 demonstrate interesting connections between age misreporting and household status. Among all household members, age rounding was most frequently observed among the parents of the head. These findings are consistent across all of the relevant columns of the table (comp. Dillon 2008, 110). This may suggest that the intimacy of intergenerational co-residence and the widely accepted notion – especially in the eastern regions – that the elderly had an exceptional and superior social status (Obrębski 2007) did not necessarily imply that household members knew each other’s precise ages. Equally surprising is the finding that in the west, where both service and lodging were prevalent, the age reports provided by the household heads and their spouses were not much more accurate that those supplied by marginal household members; whereas in the eastern parishes (and in the entire aggregated collection), they were markedly worse57. Although there is little direct evidence of how exactly the ‘survey teams’ gathered their data ← 866 | 867 → in the enumeration process in Poland-Lithuania, a provisional inference from the above findings could be that the heads did not always supply the age information for all of household residents (cf. Kaiser and Engel 1993, 832–834). Moreover, across all of the samples, heads’ co-resident children were found to have been less likely than other domestic group members to have reported ages rounded to zero, but they were not shown to have been substantially less likely to have engaged in age heaping in general (cf. Kopczyński 1998, 73–74).

The size of the population registered in a listing – but not the structure and size of the domestic groups in which the population lived – also seems to have mattered (albeit mainly in the west). In the west, the proportions of individuals rounding off their ages were very similar irrespective of the structural characteristics of the dwelling they occupied. There is evidence that in the east a lack of co-resident kin in the household might have had an aggravating effect on the individual tendency to misreport age, but it is only partly confirmed when the entire collection is inspected.

The above-presented analysis reveals complex and interesting patterns, but it tells us nothing about the relative strength of these different associations, or about the extent to which certain patterns remain when other factors are controlled for. In real life, a person’s behavior is simultaneously affected by many factors. Thus, in order to determine the unique effect of a specified characteristic of an individual, we need to make sure that the confounding effects of all of the extraneous variables are eliminated. In order to get a better sense of how each individual characteristic was associated with a given behavior, we used logistic regression, and focused on one particular aspect of its interpretation; i.e., the odds ratio58. In the regression techniques this ratio is used to assess the risk of a particular outcome (in our case, the risk of rounding off the age) if a certain factor (or characteristic) is present. The odds ratio tell us how much more likely it is that a person who is exposed to the factor under study will develop the outcome than a person in a comparison group (often called a reference group). ← 867 | 868 →


Table 13: Persons reporting an age ending in zero and zero or five by individual and group characteristics, western and eastern parishes of Poland-Lithuania (%).


Source: CEURFAMFORM Database. Persons aged 15+. ← 868 | 869 →

Table 14 presents logistic regressions on the probability of reporting an age ending in zero or multiples of five for individuals aged 15 and above (for the sake of simplicity, the regressions were run only for the aggregated collection). Most of the variables used previously in the cross-tabulations were included as predictors (they are listed down the left side of the regression table), and for each of them we first ran a bivariate model, which was subsequently complemented with a model that included all of the covariates59. This allowed us to see how the effect of an individual characteristic changed when we shifted from analyzing the given relationship in isolation to assessing its importance in the multivariate context (note, however, that in Table 14, as well as in Table 15, only the results of the multivariate model are included; all of the reference groups among the predictors have odds ratios at of one, i.e., 100 percent). ← 869 | 870 →


Table 14: Logistic regression on the probability of reporting an age ending in zero and zero or five on selected characteristics: persons aged 15+ in Poland-Lithuania (Odds Ratios); basic model.


Source: CEURFAMFORM Database. ← 870 | 871 →

Overall, the multivariate analysis confirmed many of the patterns observed in the cross-tabulations presented before, albeit with some important modifications. Even after controlling for individual demographic and residential characteristics, individuals from eastern parishes were 50 to 58 percent more likely to report a rounded age that individuals from western parishes; but this east-west differential diminished when the effects of the survey type were held constant. When judged independently of the origin of the census category, individuals from the east scored 22–23 percent better than individuals from the west. This indicates that the difference observed between the two clusters in the bivariate model was driven primarily by specific data collection mechanisms in these eastern territories, and that there was nothing inherently ‘eastern’ in the propensity for age rounding. Regression of sex on the propensity for age heaping confirmed that women were slightly less likely to round off their ages than men, and this finding remained robust to the effects of the other predictors (especially for the second of our response variables). As witnessed in the cross-tabulations in Table 13, age itself remained strongly associated with digit preference, although the odds ratios were consistently higher for the probability of age rounding in multiples of five than in zero only. Even after controlling for a host of interacting variables (including an eastern or a western geographical location, as well as the type of household listing), individuals aged 35 and above were found to be twice as likely to heap ages on zero or five.

In the initial analysis it appeared that people who were living without a spouse were less likely to have reported their age in an even number. However, after we shifted our focus to a digit preference for multiples of five, and we applied basic controls for age, it was found that the likelihood that people without a co-resident spouse would misreport their age was 27 percent higher relative to people living with a partner; which is in line with theoretical expectations.

Our findings on the influence of household status were more ambiguous. In the model assessing the probability of digit preference on even numbers, members of the parental generation were found to have been 1.38 times more likely to have rounded off their age than heads of domestic groups. But regressing the group status against the second response variable made this discrepancy much more modest in magnitude, and yielded a likelihood for age rounding that was only six percent higher than that of heads. Thus, being a parent was shown to have affected the probability of age rounding as much as being a relative other than a spouse or ← 871 | 872 → a child. The spouses of the head were less likely to have rounded off their age to zero than the heads themselves, and they more likely to have done so than co-resident children; but this relationship was less clear-cut in the second model.

The most consistent results were obtained for the category of co- resident non-kin. For both types of age rounding, holding demographic characteristics constant60 yielded a positive association between being a co-resident non-relative and the propensity for age rounding. The finding that marginal household members were more likely to have engaged in age heaping than relatives of the household head seems to provide strong evidence that the heads were involved in the process of age reporting. This appears to call for a revision of the previous observations which were based only on cross-tabulations. We will return to this issue later.

The modest effect of the size of the settlement in which individuals lived on the likelihood that they had rounded off their age was confirmed in both the bivariate and the multivariate models. All other things being equal, people living in communities with over 1.000 inhabitants were on average 1.30 as likely to misreport their ages as people living in communities with fewer than 500 inhabitants. However, the exact threshold of population size that mattered for age reporting quality remains unclear. The same can be said about the relationship between the size of domestic group and the extent of age heaping, although in all of the models people living in larger households seem to have been less prone to rounding off their age.

In the bivariate model, the propensity of simple household members to round off their age was systematically outstripped by the age heaping tendencies of people in other residential arrangements61. That changed, however, after additional variables were alternately entered into the equation. Controlling for the type of listing and the individual household status obliterated the relationship between age rounding and living in extended ← 872 | 873 → and multiple family environments62. However, the distinctive effect of living in a residence group with neither a parental nor a conjugal link present (‘no family’) remains genuinely robust to other confounding interactions. With either of our response variables, individuals in this type of residential situation were 39 to 60 percent more likely to misreport their age than individuals in domestic groups with a conjugal family unit. The situation of people in solitary households was similar, particularly when rounding to even numbers was involved. These findings point to the significance of conjugal and parental links for the quality of age reporting. It seems that a lack of family members in the household might have had an aggravating effect on the individual tendency to misreport age in our data.

One missing aspect of the previously discussed models is that they could only partly account for the possible effects of household heads’ numeric abilities. If we assume that these heads normally provided enumerators with information on the ages of all of the people living in the household, then we would expect household heads who rounded off their own age to also have been more likely to misreport the ages of other domestic group members. To investigate this issue, we ran additional regressions using a surrogate dataset which included one previously unexplored individual characteristic: i.e., information on a person’s membership in a household in which the head had their age rounded with zero or multiples of five63. ← 873 | 874 →

The new models generally produced results that were very similar to those of the previous regressions (Table 15). Most of the predictors had a similar order of importance, although the strength of some of them changed. In the bivariate analysis, the new variable partly surpassed the relative effects of the Russian census type, and had odds ratios which were much higher than any of the other variables except membership in age groups 55 and above. In the multivariate models, the absolute strength of the new variable was mitigated, as were the effects of previous top predictors (census type and age group). Accordingly, the head’s age heaping pattern yielded a risk of an individual rounding off his/her age that was only third to the type of enumeration or the individual’s age group membership. ← 874 | 875 →


Table 15: Logistic regression on the probability of reporting an age ending in zero and zero or five on selected characteristics: persons aged 15+ in Poland-Lithuania (Odds Ratios); extended model.


Source: CEURFAMFORM Database. ← 875 | 876 →

Holding other variables constant, individuals who were living in a house with a head who had rounded off his age were almost 1.90 as likely to misreport their own age as their counterparts in domestic groups led by a head who showed no digit preference. Even after controlling for the effect of the household head, age heaping remained much more pronounced in the Russian revisions, and more prevalent among marginal domestic group members and residents of ‘no family’ domestic units.

1.5.5  Age heaping in Poland-Lithuania: discussion

Despite having been drawn from rural societies of one specific historical-geographic area of Europe, our samples reveal huge inter-regional discrepancies in the quality of age reporting, and explaining these differences is a formidable challenge. We may, for example, find that the numerical capabilities of the eastern Greek Catholic populations were very different from those of the predominantly Catholic and Protestant populations of the west. However, this distinction is neither absolutely straightforward in the cross-tabulations, nor does it appear to be robust in multivariate testing. Could it be the case that when we are measuring age heaping, we are actually measuring the numerical abilities of the enumerators or their ability to estimate age, rather than individual capabilities (i.e., ‘quantitative numeracy’)?

Age data are normally derived from statements made by the ‘informant’ or a second party (husband, father, etc.), but they may also be obtained by estimates made by the enumerator. While a large proportion of age misreporting indeed arises because the respondents do not know their exact age, this problem is likely to be exacerbated by differences in the performance of the enumerators, as some of them may take their duties more seriously than others (United Nations 1952, 59)64. The test of the association of the census type with age rounding we have just attempted (see lowest panels of Tables 13–15) may provide some insight into this problem.

In all of the regression outcomes a common pattern can be seen in the risk factor associated with various types of censuses, revealing strong ← 876 | 877 → differentials which persisted even after controlling for potentially confounding factors. The probability of reporting an age ending in zero and multiples of five increased systematically when we moved from non-standard types of enumerations (‘other types’), to the reference category of the Commissions’ censuses, to the status animarum, and finally to the Russian revision lists. In different regressions, the likelihood that respondents surveyed in the soul revisions misreported their age was almost four times as large as it was among the respondents in the Commissions’ surveys, and was up to eight times as large as among the respondents in the highest quality records in our possession65.

These differences might be partly explained by looking at the different organizing principles of the enumeration process inherent to different types of listings. The census-taking efforts of the Commissions capitalized on the more than one hundred years of experience in surveying people of the church administration. Local clergy, who were supposed to deliver information to the relevant agencies of the Commissions, were on average very familiar with the religious and everyday conduct of their flocks. Having access to birth and baptism registers, clerics were in a position – at least theoretically – to check and correct age statements provided by their respondents (Ładogórski 1952, 56–57). The age information provided in the status animarum (and the Seelenlisten) in our collection would not have been structured very differently, and the fact that the data we possess appear to be less reliable than listings administrated by the Commissions is possibly due to the relatively early origins (17th century) of nearly 30 percent of them. It is not unreasonable to assume that the degree of control over the process of population enumeration was weaker in the pre-Enlightenment period, especially if we take into account the absence of an additional controlling instrument in the shape of the interference of state administration during the listing action in 1790–9166. ← 877 | 878 →

The excessive age misreporting in the Russian revisions also calls for a review of the prevailing administrative context of data collection. At the local level, the duty of conducting listings fell to the owners or administrators of estates (often supported by a deputy selected from representatives of local Polish or Lithuanian gentry; see Kędelski 2000, 103–104); that is, people who by definition would have been unlikely to have had the same degree of knowledge of the listed individuals as the local clergy. This was partly due to the enforced changes in ownership (especially after 1794)67, and partly to the likelihood that the local subjects viewed them with suspicion. The traditions of drawing up estate inventories which were established long before the final dissolution of the Polish-Lithuanian state mattered little in the context of the enumeration requirements imposed by the Russian governors, as the vast majority of the old magnate, royal, and noble estate inventories did not collect data on age, but information of a different sort. Moreover, the 1795 revision (the 5th Revision) was a complete novum on lands freshly incorporated into Russia after the third partition, and it was conducted in an environment that was linguistically alien and unfamiliar with the Russian administration. This situation gave the local deputies responsible for conducting the listing substantial leeway with regard to the quality of the collected data, which resulted in significant deficiencies in the registration of populations (Sikorska-Kulesza 1995, 9–15, 42; also Legun and Petrenko 2003, 10)68. ← 878 | 879 →

Accordingly, it is necessary to explain the exceptionally high quality of age registration in the group of listings previously labeled ‘non-standard.’ Over 80 percent of the individuals listed in these registers were in the territory of region 7, and three-quarters of the population in the region were in these types of listings, including the inhabitants of the Gryf dominion belonging to the magnate family Schaffgotsch from the Lower Silesian district of Lwówek. Listings for these localities were conducted on orders of the domanial administration in the years 1779–1805, but the actual responsibility for the census-taking fell to the governing bodies of the individual villages (Kwaśny 1966). The motivations for conducting the listings and the methods used in carrying them out are not known to us. What is, however, known is that the representatives of the central administration of the Prussian state believed that the domanial authorities had a high degree of familiarity with the relationships among the villagers. To a large degree, this familiarity came from having regularly compiled high-quality ‘soul registers’ (Seelenlisten) to meet the needs of the particular dominions (Kwaśny 1966). During nationwide conscriptions in Prussia, registers compiled by private administrators at the local level, such as the 1787 register (Ładogórski 1952, 71–72), were consulted. The presence of a stringent system of population registration is suggested also by the two remaining listings from the group discussed here. The register of the inhabitants of a Nowodworska domain in the Warmia region (region 1) was a remnant of nationwide surveys of Prussian populations. This register was part of the Prussian Cataster of 1772, a data source considered to be of high quality (Bachanek 1997; Cackowski 1967)69. The listing of the Czacz parish population was conducted by the local church administration on the orders of the national authorities of the Duchy of Warsaw in preparation for the universal listing of the population in December 1809 (known as the 1810 listing) (Borowski 1976, 100–103; on the listing – Gieysztorowa 1976, 91–105). Overall, it appears that the extent of involvement of ← 879 | 880 → local and regional civil administration in these enumerations significantly improved the way in which they were conducted, making these listings better than those of other regions. In these cases, the very process of surveying people created additional apparatuses that led to greater precision in population registration.

The findings of the preceding sections revealed a strong attraction to preferred final digits (or an avoidance of certain digits) in age reporting in the census microdata from historical Poland-Lithuania. A wide range of factors were associated with age rounding, but the type of survey the age information was derived from appears to have been of critical importance. Enumeration in the Russian revision lists was by far the most important predictor of age rounding in our data. A person’s age (i.e., being 35 years old or older) was considered less important than many other individual characteristics, including the person’s residential and marital situation. The age heaping pattern of the head of the residence group was found to have been equally important, as individuals living in a household in which the head rounded off his age were more likely to have misreported their own age than their counterparts in living in a household in which the head had reported his age more accurately. Non-relatives in a domestic group and people living in a ‘no family’ residence exhibited a stronger pattern of digit preference than individuals in other categories.

These findings are generally common to inaccurately reported ages in almost any country, even though the intensity of the attraction for and the avoidance of certain figures may not have been as pronounced as they were in our case. On the other hand, the comparisons above make clear that the indicators calculated on the basis of our material were sometimes better than the data coming from other regions of pre-industrial Europe, and even from some contemporary developing countries. What matters most, however, is the recognition that the powerful effects of age heaping in our population preclude a direct, year-by-year analysis of the age-sex composition; and that other means of assessing the reliability of our data have to be considered. ← 880 | 881 →

1.6  Population age structure

Errors in age statistics may result from both mis-statements of the ages of those who were enumerated and from differences in the relative completeness of the enumeration of individuals in different age groups. Having accounted for the first type of problem in the preceding sections, we will now move on to the assessment of the reliability of age group classifications. One approach that is often used for determining more realistic age distributions in the aggregate population is dividing the population pyramid into intervals in line with the standard age group classifications widely employed in demographic practice (Young 1900; United Nations 1982; Hobbs 2008, 155–156). While in the earlier parts of the present chapter we focused on the proportions of two specific age groups – i.e., ‘the children class’ (aged 0–14) and the ‘grandparents class’ (aged 65+) – our next step will be to evaluate the proportions of consecutive age groups separated by five- or 10-year intervals, while acknowledging gender disparities and regional specificities.


Figure 20: Population age-sex structure in quinquennial age groups superimposed upon the structure based on single-age years, the complete CEURFAMFORM collection.

Source CEURFAMFORM Database. ← 881 | 882 →


Figure 21: Population age-sex structure in quinquennial age groups superimposed upon the structure based on single-age years, the western cluster of Poland-Lithuania.

Source CEURFAMFORM Database.


Figure 22: Population age-sex structure in quinquennial age groups superimposed upon the structure based on single-age years, the eastern clusters of Poland-Lithuania

Source CEURFAMFORM Database. ← 882 | 883 →

In the simplest kind of analysis of the age-sex data, the magnitude of these mutually relative numbers can be examined by looking at Figures 20–22. The figures represent two superimposed population pyramids, each of which shows the actual population in single years upon the same population represented with a five-year age group classification, first for the entire collection, and then for the two geographical clusters. When we look at the pyramids, we can immediately see that our strategy was not entirely successful in overcoming the avoidance of disliked digits or in smoothing out the rates of attrition. Although classification by these five-year intervals helped to substantially smooth out the rough edges caused by heaping ages, the effect of digit preference remained even in this conventional grouping due to particular distortions on marginal numbers like zero, four, five, and nine. In the absence of specific past events which could help us understand these persistent irregularities, the assumption of a smooth age distribution is reasonable, as it implies that the peaks and gaps were the result of certain preferences in reported ages or other errors in population coverage. To assess more thoroughly the departures from a regular pyramid in our census samples, the age ratios test can be used70. Table 16 gives the age ratios for males and females of all of the regions. Accordingly, Figures 23–24 plot the age ratios for the complete collection along with figures for the western and the eastern parishes, including comparative data for Polish territories from around 1900.

In natural fertility societies, the proportion of the population in each age group should be greater than the share above it and smaller than the share below it, giving the pyramid its prototypical, broad-based, triangular form (Rowland 2003, 98–100). Deviations from this pattern in a high birth-rate society are possible if marked changes in the birth rate, the immigration or emigration rate, or the mortality losses affecting particular age groups have occurred. Neither of these observations seem to fully apply to the historical populations represented in the figures above71. With the exception of two regions (9 and 7), the numbers of people aged 0–4 ← 883 | 884 → are smaller than the numbers of those aged 5–9, as the age ratios for the latter age groups were considerably above 100. This problem appears to have been mainly related to deficiencies in the enumeration of the youngest children (above), and not from the overstatement of the ages of children under age five (United Nations 1952, 62). Across all of the regional populations except for those in group 7, fewer individuals were reported at ages 25–29 than at ages 20–24 or 30–34. It seems implausible that a large birth deficit in previous cohorts could account for this pattern. ← 884 | 885 →


Table 16: The age ratios for males and females, by regions of Poland-Lithuania.



← 885 | 886 →

Table 16: Continued.


Source: CEURFAMFORM Database. ← 886 | 887 →


Figure 23: The age ratios for males for Poland-Lithuania, its major regions, and for Poland around 1900.

Source: for Poland-Lithuania – CEURFAMFORM Database; for Poland around 1900 – Szulc 1936.


Figure 24: The age ratios for women for Poland-Lithuania, its major regions, and for Poland around 1900.

Source: for Poland-Lithuania – CEURFAMFORM Database; for Poland around 1900 – Szulc 1936. ← 887 | 888 →

Similarly patterned differences between quinquennial age groups were found for all of the respective cohorts in the succession, and irregularities among the oldest groups were particularly strong. It is improbable that fluctuations in birth rates over the century preceding the censuses could have produced these effects. Also peculiar was the fact that in the majority of the regions (and in the overall collection) males aged 30–34 outnumbered not only males aged 25–29, but also males aged 20–24; while in most instances this was not the case for females. This indicates that the ages of the males were misreported differently than those of the females, or that many men aged 20–29 were not enumerated. Without entirely rejecting the possible effects of underenumeration, these alternating excesses and deficiencies in the five-year age groups could also be attributed to the very powerful attraction exerted by figures which are multiples of 10, particularly age 30. Many of the regional peculiarities cancel each other out at the aggregate level, so that macro-regional differences in age composition revealed in the five-year age groups do not seem to take on particularly profound forms (Figures 23–24). Still, after the age of 29 the populations of the eastern territories tended to display more extreme age ratios than those of the west, on both high and low scores.

To better evaluate inter-regional differences in age reporting quality, it is helpful to derive a summary measure of variability exhibited in quinquennial age ratios. To this end, a mean deviation of age ratios from 100 percent for the successive age groups (averaged irrespectively of sign) can be calculated. With the aid of such a measure, known as the age ratio score or the age-accuracy index (United Nations 1952, 61; Hobbs 2008, 148), the degrees of variability of the age ratios calculated from the results of all of the regional population counts can readily be compared (Table 17)72.

The differences revealed among the figures of the second and third columns of Table 17 overlap to a certain extent with the patterns of single digit preference displayed above. Compared to the other regions, regions 1 and 7 (also region 10) remained at the forefront in terms of the accuracy ← 888 | 889 → of their age structure, and they displayed by far the smallest average deviations from the standard age schedule. Similarly, smaller average deviations from the formulaic structure reflected the higher quality of female age registration relative to the rest of the collection. In a few instances, however, merging single year observations into quinquennial age groups resulted in a substantial shift in the region’s position in relation to previous quality assessments based on heaping indexes (like in region 2). In general, however, the classification into standard age groups confirmed the levels of diversification already found across the different survey types, and especially the much lower degree of accuracy of the age group proportions in the Russian listings.


Table 17: Measures of age-sex accuracy, by regions of Poland-Lithuania and type of enumeration.


← 889 | 890 →


Table 17: Continued.


Source: for 18th-century Poland-Lithuania – CEURFAMFORM Database; for later enumerations – own calculations based on Szulc 1936; for 1931 Poland – own calculations based on Rothenbacher 2002.

* Adjusted for population size.

** The corrected index may be negative if the sample size is very small.

1.7  Proportions of the sexes

Classification by sex is not only of fundamental importance in demographic analysis, it also provides us with additional insights into the issue of census data quality (Fauve-Chamoux and Sogner 1994; Poston 2006). Even though sex is seldom reported incorrectly in census enumerations, statistics classified by sex may still contain errors because of irregularities in enumerating individuals of particular sexes: i.e., one sex may be better represented in listings than the other. Thus, for the examination of the accuracy of age statistics computing sex ratios is indispensible73. ← 890 | 891 →

In general, a population not much affected by migration should have approximately equal numbers of males and females, whereby a slightly higher number of males among children is usually more or less counter-balanced by slightly higher number of females among adults. If the reported distributions are accurate (or if the errors for males are as frequent and of the same kind as those for females), sex ratios should change only very gradually from one age to another, as they are determined mainly by the sex ratio of births and the gender differences in mortality at various ages74. The presence of marked fluctuations in these ratios indicates errors which are not the same for the two sexes. This straightforward situation is often modified by various socioeconomic and place-specific factors, such as casualties caused by wars, gender-specific migratory movements, or parental manipulation (United Nations 1952, 60; 1955, 31; Fauve-Chamoux and Sogner 1994). These special conditions notwithstanding, if the sex ratios present a systematic picture we can assume that the two sexes have been enumerated with comparable accuracy. Deviations from the pattern which cannot be readily explained should be singled out for further investigation of their accuracy.

Our situation was complicated by the fact that so far the expert literature does not provide an established view of the specificity of sex proportions in a population in the feudal era in East-Central Europe (see Janczak 1975, 34; Kuklo 2009, 240; also Kopczyński 1998, 74–79). This would not present any great difficulties were it not for the fact that the ratios inferred from our regional data have substantial discrepancies, and thus different degrees of deviation from the theoretically expected proportions (Figure 25). ← 891 | 892 →


Figure 25: Sex ratios by regions of Poland-Lithuania.

Source: CEURFAMFORM Database.

The general ratio of men to women that has already been calculated for the whole collection (extreme left-side bar) and divided into particular regions suggests that caution should be used in the assessment of the investigated material. In an ‘ideal’ population which is not subject to interference from exogenous factors, but instead follows the standard age-specific mortality rates, the general population of women should be slightly more numerous than that of men (Hobbs 2008, 130–131). In the investigated populations, however, this was the case for only four western regions, and the numbers of men relative to women were particularly high in the eastern territories75. The sex ratios for half of the regions fell into the acceptable range of 95–102 (values below 90 and above 105 are usually viewed as extreme; Shryock and Siegel 1976, 107), and only two regions (both eastern) appeared to clearly exceed the acceptable value thresholds (regions 8 and 10). This was surprising given that the age registration in the listings from these regions, and especially from grouping 10, had previously been shown to have been of high quality. Sex ratios more favorable for males ← 892 | 893 → were typical of the eastern territories of Polish lands until the end of the 19th century, and sometimes even later (Szulc 1936, 16; Gawryszewski 2005, 208–209).

While the aggregate level did not depart too drastically from the expected patterns, the division into age-specific sex ratios uncovered several suspicious trends (Figure 26).


Figure 26: Age-specific sex ratios in western and eastern areas of Poland-Lithuania and in early 20th century Poland.

Source: for 18th-century Poland-Lithuania – CEURFAMFORM Database; for later enumerations – own calculations based on Szulc 1936; for 1931 Poland – own calculations based on Rothenbacher 2002.

Until the late teens, the sex ratios fluctuated quite strongly around 100 in both the western and the eastern clusters. This was followed by a substantial decrease in the numbers of males per 100 females between the ages of 15 to 29 years. In the older age groups, however, and especially among individuals over age 34 the prevalence of males was striking, as were the fluctuations of the ratios between the successive age groups. None of these observed trends inspire much confidence in the quality of the material, and the latter two trends seem to have been entirely erroneous, if systematic. The sex composition in the group of younger generations was a conglomeration of miscellaneous regional tendencies, the lion’s share of which ← 893 | 894 → were undeniably fallacious76. Omissions of males aged 15 to 29 pervaded many 18th- and 19th-century enumerations taken on Polish territories (and elsewhere), and were usually explained either by migration or, more frequently, by efforts to help young people avoid military conscription. After taking into account of the size and spatial distribution of the examined populations and their mostly manorial character, we concluded that the first explanation was rather unlikely, but that the second explanation was well-founded, especially for the final decades of the 18th century (Kopczyński 1998, 75–76)77.

A deficiency of women in older generations could be seen in both major spatial groupings, though severely overestimated values of sex ratios (above 160) were most prevalent in the eastern regions. The trend could also be observed in regions with above-average quality of population registration (regions 1, 7, and 10). This phenomenon contradicts the basic laws of population age-sex distribution, as it is difficult to imagine what conditions could account for such a striking overrepresentation of males relative to females among both adults and the elderly (cf. Shryock and Siegel 1976, 110; Fauve-Chamoux and Sogner 1994; also Kuklo 1998, 48–49). Thus, in the older age groups we have to deal with substantial omissions of women, especially in the eastern regions.

The tendencies outlined above were more or less pronounced in all types of sources (Figure 27). Two main listing categories (the Commissions’ censuses and the Russian revisions) were similar. In addition, the sex ratio distortions typical of these listings were found to have been ← 894 | 895 → systemic in character; thus, with the exception of the oldest generations, these tendencies did not depart too drastically from the average value for the entire collection. A 20 percent deficit of men was followed by a distinct ‘over-production’ of males aged 35–49. The ratio evened out in part in the fifth decade of life, but the distortions returned in the oldest generations. As expected, the quality of the age-sex reporting in the non-standard listings (‘Other’) was much higher than that of other types of listings, although this specific case was also far from ideal. Listings from the status animarum group, though free of a pronounced overrepresentation of males relative females among adults and the aged, suffered from erratic fluctuations in the ratio among the youngest and the middle aged.

The exact problem of the male surplus in the later stages of life is hard to diagnose with any precision. However, there are several possibilities that could be considered. The systemic character of the sex ratio distortions noted in our material leads us to assume that the influence of gender differentials in digit preference may have also affected the observed tendencies. The pronounced but artificial tendency to reallocate individuals among particular age groups as a result of age heaping, which was unevenly distributed among the sexes, could have led to distortions in sex ratios in particular cohorts.


Figure 27: Age-specific sex ratios in Poland-Lithuania by type of the enumeration.

Source: for 18th-century Poland-Lithuania – CEURFAMFORM Database; for Poland 1900 – own calculations based on Szulc 1936. ← 895 | 896 →

In order to eliminate the potential effects of age clustering and to discern more clearly the central tendencies of the data, sex-specific five-year population totals from our collection were adjusted using the strong smoothing technique recommended by the United Nations (Arriaga et al. 1994)78. In Figure 28 the sex ratios obtained from the reported age schedules are plotted against those derived from smoothed distributions, along with the age-sex patterns derived from more recent enumerations.

As expected, adjusting the population age structure with the smoothing technique made the rough edges of the curves caused by a saw tooth fluctuation of sex ratios in the reported figures much more even, but the overall trends remained. The underreporting of young males, while less pronounced, was still present in all of the configurations of our data, as was the increased attrition rate among women after age 35. Thus we can assume that the impression that elderly males outnumbered females at an accelerating rate could not be primarily attributed to the belated emergence of previously under-registered males (years 35–54) or to an artificial relocation of men in their fifties who deliberately exaggerated their ages (65+). These findings confirmed our earlier assumptions about the significant omissions of women in the older age groups79. ← 896 | 897 →


Figure 28: Reported and smoothed age-specific sex ratios in western and eastern areas of Poland-Lithuania.

Source: for 18th-century Poland-Lithuania – CEURFAMFORM Database; for later enumerations – own calculations based on Szulc 1936; for 1931 Poland – own calculations based on Rothenbacher 2002.

The potential deficiencies in the early modern evidence became particularly apparent when they were compared with later-date statistics. At the turn of the 19th century, the most extreme fluctuations between the age-specific proportions of the sexes typical of earlier data could barely be found on Polish territories. However, by the 1930s the age composition revealed in the census returns seemed to match closely the demographic standards.

However, some other facts that seem to suggest that the observed age distributions mirrored the actual proportions of the sexes in the population under investigation should be considered. It is, for example, known that in many high-mortality countries of 20th-century Asia, in which female mortality tended to be higher than male mortality, the proportions of the sexes often deviated substantially from the ideal in which the sex ratios tended to diminish with age. Although the application of such an idiosyncratic perspective to the analysis of historical sex ratios may have a certain appeal, it quickly becomes clear that doing so makes little sense, given that even in those non-standard populations the sex ratios among older adults and the aged normally did not rise above the 110–115 level (United Nations 1952, 70; 1988, 30). ← 897 | 898 →

Furthermore, some other scarce auxiliary evidence contradicts this view, showing that males had a higher probability of death than females at older ages in 18th-century Poland-Lithuania. These findings come from the province of Greater Poland (part of the western cluster) between 1816 and 1900 (Kędelski 1985; 1980). It is not entirely clear, however, whether this pattern might have been replicated in other territories of historical Poland-Lithuania, and some local case studies seem to suggest that it was not (Piasecki 1985). More serious inquiries into this matter are, however, inevitably hindered by a lack of more extensive comparative investigations of the demographic conditions of the late feudal period (see also Ch. 5).

Explanations for the deficits of females in our dataset that refer to their potential migration to urban centers appear to be equally unconvincing (Fauve-Chamoux and Sogner 1994; Kuklo 1998, 49–55). It is important to bear in mind that the more significant migratory movements from rural areas to towns and cities, especially in the case of migrations between estates of different owners, had to be rather limited in character under the conditions of serfdom typical of early modern Poland-Lithuania. It is also rather hard to imagine that these trends would have mainly affected older women. Moreover, the available data on the age structure and the sex ratios in urban populations of pre-industrial Crown lands do not provide clear evidence of a surplus of women in older age groups. In 18th-century Olkusz, Praszka, Radziejów, Wieluń, and even Warsaw the numbers of men aged 35 and older were higher than the numbers of women of the same age (Kuklo 1998, 45–46). The situation was basically similar in the towns and cities of the Grand Duchy of Lithuania80. We can therefore assume that defeminization was widespread in the listings of the towns and cities of the old Commonwealth, which might further confirm our conviction that there were deficiencies in our own registration81.

One way to wrap up the discussion above is to return to Table 17 (fourth column) and look at the values of the sex ratio scores conceived as the mean difference between the sex ratios for the successive age groups, ← 898 | 899 → averaged irrespective of sign82. Regardless of the type of spatial data configuration considered, the sex ratio scores fluctuated at well above 10 points, which strongly indicates the presence of widespread errors. The observed biases in sex ratios at successive age groups – particularly in regions 9, 2, 11S, and 8 – were greater than any conceivable irregularities of the age-specific sex structure that could have arisen from real causes.

Accordingly, in the fifth column of Table 17, the values of the age-sex accuracy index (ASAI) are presented in order to provide a summary measure of error in the age-sex data derived from censuses. It is conceived as the combined sum of: a) the mean deviation of the age ratios for males from 100, b) the corresponding measure for the female population, and c) three times the mean of the age-to-age differences in the reported sex ratios (United Nations 1952, 1955)83. The census age-sex data are described by the UN as being ‘accurate,’ ‘inaccurate,’ or ‘highly inaccurate’ depending on whether the index is under 20, 20–40, or over 40; respectively (Hobbs 2008, 150).

Against the backdrop of such a rigid scheme, there seems to be no way to escape a negative assessment of the census data that form the present collection84. Apart from two regions and the non-standard category of listing type (‘Other’), all of the figures in the fifth column of the table were well above 100. The geographical and source-related differences in the data quality were again reaffirmed in the distribution of the index value: the western parishes scored somewhat better than the eastern parishes; regions 1 and 7 scored far better than all of the other regional census returns in terms of age-sex data quality; and the Russian revisions had far more reporting errors than the other types of listings. While adjusting for the population size lowered the values of the index for all types of data ← 899 | 900 → groupings, this bolstered our confidence only a little85. The value indexes in most cases remained above the 100 level, although due to the fitting processes regions 1 and 7 (like the ‘Other’ category of the listing type) reached a level of quality acceptable for the standards of contemporary demography (below 20). Meanwhile, the index value for the entire collection (116) was comparable to the values calculated from listing materials from African countries from the second half of the 20th century86.

1.8  Age schedules from Poland-Lithuania compared with other enumerations

One limitation of the above methods is that they do not necessarily tell us to what degree the differences in successive ratios and the deviations of age ratios from 100 should be ascribed to real causes (e.g., the peculiarities of the structure of the population) on the one hand, and to the inaccuracy of the data on the other (Hobbs 2008, 142, 150–151). When the results of other enumerations for comparable territories are available, it is often possible to clear up these uncertainties even without the use of more elaborate techniques. This was, unfortunately, not possible in our case.

A surrogate version of this approach could compare the age schedules from our regional samples with age distributions reported in one of the later population listings from the Polish-Lithuanian territories that were characterized by a wider territorial scope, a greater numerical base, and – presumably – better data coverage. Our reference age schedules were derived from the reconstruction of the population age-sex distribution on historical Polish lands from 1897/1900, which was proposed by Szulc (1936; also ← 900 | 901 → Gawryszewski 2005, 220; Gieysztorowa 1976, 101)87. Figures 29 and 30 illustrate convenient ways to summarize the differences between the age-sex patterns in our complete dataset and these later enumerations.


Figure 29: Age-ratio deviations in Poland-Lithuania and in later enumerations from Poland.

Source: for 18th-century Poland-Lithuania – CEURFAMFORM Database; for later enumerations – own calculations based on Szulc 1936. ← 901 | 902 →

In Figure 29 the fluctuations present in two age reporting patterns are represented by deviations of the age ratios from 100 percent, and are plotted against each other. The general trend is very similar in the two enumerations. In both enumerations the overrepresentation of the age groups was most likely to have occurred for those starting with even numbers, while the underrepresentation was most likely for those starting with five, with the major difference being that there was a much higher number of deviations for males than for females in our data than in the more recent material. Up to the mid-twenties, the magnitude of the departures from the expected ratios did not markedly diverge between the populations. A critical discrepancy between the two could, however, be seen in the subsequent age groups, in which the deviation from the standard was at least twice as high in the 18th-century material as it was in the age schedules from around 1900.


Figure 30: Sex-ratio differences in Poland-Lithuania and in later enumerations from Poland.

Source: for 18th-century Poland-Lithuania – CEURFAMFORM Database; for later enumerations – own calculations based on Szulc 1936. ← 902 | 903 →

On the other hand, the reported sex distributions seem to have been subjected to equally erroneous fluctuations over the quinquennial age groups in both of the populations, even though the errors often displayed a reversed pattern in two of them (Figure 30)88. While the age-sex schedules obtained from more recent enumerations scored below the acceptable standards of demographic analysis (ASAI of 46), they were more than twice as good as the data from the present collection (ASAI of 116; see Table 17).

1.9  Domestic group members most affected by under-reporting

The underregistration of young males and the underestimations of older women discussed above lead us to speculate about which categories of domestic group members might have been most affected by these deficiencies.

It seems most reasonable to attribute the underrepresentation of men aged 15–29 years to lower levels of survey coverage for the non-married sons of the heads, as they were directly threatened by military conscription. Moreover, they constituted an absolute majority among all of the sons of household heads in this particular age group (at least 73 percent). Although in some parts of the Polish-Lithuanian Commonwealth peasants might have also been inclined to not report co-resident married offspring, in those cases the hiding of family members was, for obvious reasons, much less feasible. In seeking to explain the underestimations of women, it is worth referring back to Gieysztorowa’s assertion that this problem mainly affected women from poorer backgrounds, especially single or widowed women residing as lodgers in households run by non-related individuals (Gieysztorowa 1976, 134). Assuming this reasoning is not fundamentally misleading, we can ask the question of what the impact these drawbacks may have on our estimates of the domestic group structures, living arrangements, and other parameters under consideration. ← 903 | 904 →

While the insufficient registration of young, unmarried men may well lower our estimations of the mean size of domestic group (Ch. 10.4), it is only of secondary importance for observations on the structure of co-residence groups, and probably also for the assessment of individual residential arrangements89. Several other negative consequences should also be mentioned: i.e., the artificial lowering of the population at risk in the investigation of issues such as the home-leaving process (Ch. 6), the prevalence of life-cycle service (Ch. 7), and difficulties in determining the proportions of never-married individuals (Ch. 8). In all of these cases, the implied imprecision of our estimations could constitute an unavoidable limitation of our study.

Other risks are related to an insufficient registration of women, especially older women. A significant problem from the perspective of the analysis of co-residence structures would be a situation in which the bulk of the unregistered women were derived from the group of co-resident mothers of household heads. While females were consistently overrepresented among cohabitating parents relative to men, it cannot be ruled out that in some domestic groups this category of co-residents escaped being registered at all90. Our investigation takes on a slightly different shape if, following Gieysztorowa, we assume that the majority of female ‘missing souls’ were recruited from lodgers. Although this fact holds little significance for the reconstruction of the structure of domestic groups based on the rules set forth by the Cambridge Group, it can still limit our capacity to determine the real scale of the phenomenon of co-residence. Again, however, women were strongly overrepresented among the lodgers in the reported data; and even if our data indeed covered only a portion of the inmate population, the collected material proves more than sufficient to ← 904 | 905 → demonstrate contrasting patterns of non-kin recruitment strategies as they seem to have existed in Polish-Lithuanian western territories (Ch. 10). The scale of these discrepancies is large enough to preclude any suggestion that it was caused entirely by the under-registration of certain types of household members in either the east or the west of the Commonwealth.

1.10  The lack of a golden rule and available solutions

1.10.1  Fitting the reported data into standard age schedules

When the age structures of the population remain incorrect even after the individuals are regrouped into standard quinquennial age groups, as it is in our case; it may be appropriate to try to correct errors in the reported age distributions by fitting the reported data into standard age schedules.

For this purpose, two alternative model life tables with reasonable growth rates were used as a standard, based on the assumption that the actual population is stable or quasi-stable91. To assess the parameters of the standard age distribution, the Coale-Demeny regional model life tables for stable populations were used (Coale and Demeny 1983). Two variants of the model ‘East’ were chosen for males and females, respectively; with the intent of approximating a range of mortality conditions and increases in the rates among rural populations of 18th-century Poland-Lithuania92. Two extreme ranges of life expectancy for men and women from the ← 905 | 906 → Coale-Demeny model ‘East’ were chosen. With the aid of this model we fitted and matched distinct intrinsic growth rates using the MORTPAK package for demographic measurement93. In the first case, we relied on values of e(0)= 26 years for males and 27 years for females (Piasecki 1985; Vielrose 1957; also Kuklo 2009), and an intrinsic growth rate of 0.5 percent; in the second case, the appropriate values for the life expectancy were increased to 27 and 28 years (Kędelski 1985; 1986; Piasecki 1990, 288), while also augmenting the rate of increase to one percent per year94.

Because of the highly susceptible age registration resulting from a strong heaping, we have smoothed the aggregated age-sex schedules for the entire CEURFAMFORM collection by applying light smoothing (Arriaga method) to the reported five-year age groups95. Adjusted in this way, the observed ages were subsequently fitted into the stable population distributions, separately for males and females. The outcomes of this exercise are presented in Figures 31–32, in which smoothed age distributions are juxtaposed with those obtained from fitting the smoothed data into model populations. ← 906 | 907 →


Figure 31: Age distribution of men from Poland-Lithuania: reported, smoothed, and fitted into stable population age schedules.

Source: for reported and smoothed – CEURFAMFORM Database (data for 71.595 men); for stable populations – Coale-Demeny 1983; Piasecki 1985; 1980; Kędelski 1985; 1986.


Figure 32: Age distribution of women from Poland-Lithuania: reported, smoothed, and fitted into stable population age schedules.

Source: as in Fig. 31. Data for 69.577 women. ← 907 | 908 →

As Figures 31–32 show, the match between the smoothed age distribution and the Coale-Demeny stable models is reasonably close, though not perfect. Male and female smoothed distributions fall into the sinusoidal-shaped light curves, each with its own peculiar deviation from the conditions simulated by the chosen model life table. The smoothed age composition among young men differs considerably from the model composition, and the conspicuous dearth of ages 15 to 29 presumably results from male underregistration, as already indicated in the preceding sections. On the other hand, the rise in the attrition rates among females is most conspicuous in late adult life and in old age. The discovery of high attrition rates among women at post-reproductive ages is consistent with the results obtained from the inspection of sex ratios, which strongly leaned toward female underenumeration at those ages. In the absence of additional new statistical evidence or powerful auxiliary knowledge, it is not useful to carry the investigation further. The upshot of this discussion is that, on the whole, our census returns report females more comprehensively and accurately than they report males for sub-adult and adult ages. At older ages female reporting deviates substantially from the expected values, and the age distribution of women is therefore less likely to be accurate than the age distribution of males.

1.10.2  Benefits and costs of unconventional age groupings

Although by means of adjustments the procedures applied above allow us to arrive at more realistic age distributions in the aggregate population, they are not really helpful when data on individual age are used as crucial demographic characteristics to cross-tabulate by the host of demographic or residential propensities of individuals96. Of the more usual approaches, dividing the pyramid into unconventional intervals can be applied to solve this problem (United Nations 1982), but which kind of grouping would be optimal from the point of view of minimizing heaping is not entirely clear (Hobbs 2008, 141, 155–156).

Ideally, the choice of an age group should be influenced by the manner in which digit concentration takes place (Young 1900, 27; Myers ← 908 | 909 → 1940, 406–407). The most accurate form of age group should be the one which includes with each year of concentration the years from which this concentration is chiefly drawn. This would mean that, of the two series of quinquennial age groups, the more accurate is the one in which the alternating surpluses and deficits in the size of the successive groups are less marked (Young 1900). The effects in demographic measurement and estimation of both digit preference and age misstatement can also be reduced by the use of wider – for example, decennial – age intervals. This, however, involves greater approximation, which in our case involves the need to compromise the meticulousness of the observations on at least some age-specific family or household-related behavior, such as leaving home or household formation.

On the other hand, the patterns of digit preference revealed by the Myers’ indexes of preference leave no doubt that the conventional quinquennial age grouping (0–4, 5–9, etc.) does not fully capture the true distribution of population among the different age cohorts. Compared to this conventional age categorization, the grouping in which the third or median year of each group represents a year of concentration (8–12, 13–17, etc.) may seem to be a more suitable candidate. This approach would have the advantage of allowing us to form the age groupings around the most popular years and, could thus reach a better correspondence with natural associations between the many individuals from our dataset who thought of themselves as vaguely ‘aged 30’ or ‘aged 35’ (cf. Herlihy and Klapisch-Zuber 1985)97.

However, this argumentation would only hold true if the erroneously reported years were drawn equally from the nearest digits above and below the most preferred year within the group. Unfortunately, everything we know about the patterns of misreporting that dominate our data suggests that the prevailing tendency was not simply to move the age to the nearest year of concentration, or even to the age two digits away (see above). The concentration on multiples of 10 was extremely strong in the Polish-Lithuanian data, which seems to suggest that those who contributed to it might ← 909 | 910 → have been drawn from several digits on both sides of the concentration year. In practical terms it means that by applying the grouping based on median years of concentration, we run the risk of enlarging discrepancies between the subsequent age groups even more than the conventional quinquennial classification.


Figure 33: Age-ratio deviations for conventional and unconventional age groupings by sex, Poland-Lithuania.

Source: CEURFAMFORM Database.

This point can be illustrated by calculating the age ratios, along with the respective deviations from them, for the conventional and the non-standard groupings; with the latter being the one in which the year of concentration is represented by the third or median year of each group (Figure 33). For most of the age groups displayed, the classification following non-standard rules yields age ratio deviations larger than those of the classic groupings, particularly for the female population, which indicates that the alternating surpluses and deficits in the sizes of the successive groups are more pronounced as we abandon the standard approach. The parameters displayed in Figure 33 seem to preclude a useful application of the alternative quinquennial groups to our dataset. ← 910 | 911 →

A comparison of the exemplary results obtained by dint of alternative grouping (e.g., 5–9 years vs. 3–7 years, etc.) points in the same direction by suggesting that the difference in the estimates is not particularly large, which is all the more significant considering that the general deficiencies of the statistics of the Polish early modern age rule out in principle the prospect of absolute accuracy of the potential estimations. Our attention is drawn, for example, by the surprisingly small distortions of the SMAM estimations derived from a threefold grouping of ages (Table 18) – a crucial matter given the importance of the variable ‘age at marriage’ in typological approaches to family systems. In all cases we reach nearly identical results (differences of only about half a year), which appears to call into question the need to treat the two distinct groupings as mutually exclusive alternatives.

Table 18: Measures of SMAM according to different categorizations of age, by sex; Poland-Lithuania (complete collection).
Age groupingMalesFemales
Standard (15–19, etc.)24.2520.30
Non-standard (13–17, etc.)23.7519.96
Single years24.0420.05

Source: CEURFAMFORM Database.

Larger distortions are bound to occur when different ways of grouping into age cohorts are applied to an investigation of selected individual residential characteristics (Figure 34). These are particularly visible in younger age groups, especially between 20 and 30 years of age; while among older generations differences stemming from alternative grouping evened out almost entirely. In the example provided in Figure 34, the mean difference in the age-specific share of people with a given characteristic differed by no more than three percent between the two types of age categorization, and only in two specific cohorts did the variable ‘lives with at least one child’ exceed 10 percent. ← 911 | 912 →


Figure 34: Age-specific percentages of exemplary living arrangements according to different categorizations of age data, by sex; Poland-Lithuania (complete collection).

Source: CEURFAMFORM Database. ‘Standard’ and ‘non-standard’ groupings defined as in Table 18.

Finally, the differences dictated by an alternate grouping do not allow for the values which would incline us to unequivocally embrace one or the other method, especially given that choosing any of them enables us to distinctly present the regional diversification of co-residence patterns and individual stages of life across Poland-Lithuania. Since there seems to be no clear alternative to the quinquennial age grouping, the approach taken in the book adopts standard five-year age groups ending on four and nine as a departure point for all subsequent counts. Apart from being driven by the observations made above, this decision was also motivated by our prime concern about the comparability of the Polish-Lithuanian materials with other datasets. After all, applications of the non-standard age grouping in historical family studies remain few and far between (Herlihy and Klapisch-Zuber 1985). ← 912 | 913 →

1.11  Concluding remarks

The main purpose of this chapter has been to bring to light the dangers lurking for a scholar who attempts an analysis of the population enumerations from the historic Polish-Lithuanian Commonwealth. Our findings point to the need for two general caveats. One is the awareness that we are dealing with data that hardly meet all of the standards of precision usually demanded by demographic scholarship. Indeed, it would be a fairly easy to criticize the presented collection of historical census microdata for its many inconsistencies, inaccuracies, misspecifications, and misreporting. However, we hope that all of these problems have been reviewed and discussed in this lengthy technical chapter. As the conclusions of the exercises presented on the preceding pages point to the advantages of checking data quality using multiple criteria, they also make it quite plain that any scholar who makes use of these data should do so only with an awareness of the limitations and biases which those various measures have uncovered.

However, it would be entirely misleading to underrate the significance of these mostly 18th-century listings. Not only do their remnants represent rare surviving pieces of enumerations that were impressive achievements in their own times; they also provide a wealth of quantified information of interest to any student of eastern central European demographic history. Dismissing the listings discussed here on the grounds that they do not meet contemporary standards of data quality would be akin to a conscious refusal to engage in any explorations at all of historic populations. While it seems advisable to exercise caution in the use of the present data, it also appears that the various tests we have performed on that material support the assertion that the fear of inaccuracy should not constitute a major impediment to researchers using this evidence. This is why we believe that the conclusions of this exercise are generally reassuring.

It should be emphasized that the systematic errors and biases uncovered through the analysis presented in this chapter are not decisive enough to preclude the meaningful analysis of residence groups and the living arrangements of individuals. The errors in the census statistics discussed so far ultimately limit the value of our approach only in certain cases. However, for most purposes serious deficiencies in the quality of our data can be handled (minimized or even overcome) by simply (1) eliminating the most troublesome objects (i.e., parishes or estates) from the analysis while ← 913 | 914 → still keeping the sample big enough to be spatially representative, (2) or by focusing on subsets of the population (for example, by choosing men over women when dealing with the living arrangements of elderly people, as in Ch. 10). In other cases, the deficiencies that cannot be removed should have only very modest consequences for the precision of our quantitative assessments. Moments of greater uncertainty and risk cannot be avoided; thus, we should assume that operating with tentative estimations will help us avoid situations in which the acceptance of our results would require a greater amount of faith and suspension of disbelief than most scholars would be able to muster. This might not be the most appealing strategy, but it at least provides solid foundations for a thorough yet careful analysis of residence patterns. Potentially unverifiable hypotheses and observations are difficult to escape in historical demography. Thus, any inferences that must be made due to a lack of certainty should be openly discussed.

1 The ‘prestatistical age’ is commonly regarded as being the period before the introduction of national population censuses (Henry 1968b, 385–386; Rosenthal 1997, 219–222; Del Panta et al. 2006, 597).

2 However, neither the later 19th-century nor the more contemporary censuses could avoid pitfalls pertaining to population registration; see e.g., Ładogórski 1969; Steckel 1991; King and Magnuson 1995.

3 Even the Prussian administration, one of the most statistically advanced bureaucratic machines of the 18th century, did not manage to avoid certain shortcomings in dealing with this matter (see Ładogórski 1954, 37–40).

4 Listings which could not be classified this way had to be excluded from an analysis of some aspects of household composition or living arrangements.

5 Just as the number of people in this category was generally small, so was the danger of unbalancing the general shape of the household composition in the Belarusian listings. Lodgers constituted only three percent of the total population in central Belarus (region 11N) and less than 1.5 percent in the south of the country (region 11S). In the published edition of the 5th Revision for the Bratslav province from 1795, this category of people were explicitly referred to as ‘those not settled [nieosiedli] having no houses of their own’ and ‘living by those settled [u osiedłych; i.e., those in the posession of their own dwellings]’ (Legun and Petrenko 2003, 221, 251).

6 With regard to the feudal reality of central and western Polish lands and Upper Silesia, the lodging population (komornicy or Hausleute/Inwohnern, as they were called in German-language listings) denoted those individuals who did not possess premises of their own, but who lived in buildings belonging to others and in some cases paid rent for the occupied rooms; but only provided they did not remain in service or other labour relations to the owner of the buildings. Under the personal, ground, and judicial-administrative subjection which dominated the majority of the Polish and Silesian lands under analysis here, the situation of komornicy was usually determined by a specific relationship of dependence on the liege lord (and not to the peasant making use of a given corvee household) which entailed the obligation to pay the so-called tutelary fees (equivalent of rents) or the duty to perform a particular number of days of corvee, which was treated as a komorne ‘fee’ for the owner of the household buildings (a corvee-bound peasant householder usually could not be the owner of the house) (see Orzechowski 1956; Rusiński 1987, 156–157; also Shapiro 1960). There are clues suggesting that 18th-century lodging on Ruthenian lands could have had a different morphology, though. It cannot be ruled out that the archaic form of land co-management by individuals who were not necessarily related – which still existed there in early modern times under the name of sjabrinstvo, połowinnitstvo, podsusiedstvo, or dolnitstvo (see Lubomirski 1855a, 212–217; Ochmański 1957, 51–52; Pochilevich 1957, 15; Kernazhȳtski 1931, 107–10 ; see also Ch. 1, vol. 1 8) – could have, in the course of the social transformations in the following centuries, led to the formation of a subpopulation who did not possess separate farmsteads, but instead had plots of land under tenancy from people who also rented out small parts of their premises. On the Belarusian lands, the problem of the so-called bobyli proved to be equally complex. In the Minsk province this term denoted the population registered in the revision lists following the enumeration of all of the householders, usually in separate blocks, but without describing them as ‘huts,’ ‘mansions,’ or ‘property’ (e.g., Legun and Petrenko 2003). In all likelihood, a portion of the bobyli co-resided with household heads with whom they were linked by lease contracts of some kind, but there were also bobyli who owned separate dvory (huts) (Shapiro 1960). However, works on the socioeconomic realities of the Grand Duchy of Lithuania clearly indicate that in residential terms, bobyli were treated like the lodging population of the western regions of the Polish Republic; i.e., as persons who did not form separate dyms, but who shared ‘a corner’ of the house (Pawlik 1918, 45–48; Łowmiański 1998, 131; Kernazhȳtski 1928, 84–85; also Tokc 1999). A separate category of co-residents of landowning peasant households in the Ukraine consisted of podsusiedki (literally: ‘sub-neighbors’) who normally did not possess land or huts, and whose social and residential characteristics matched those of komornicy and kątnicy from Polish lands (Litvin 2006, 114, 141, 272).

7 This ignors the fact that the position held by lodgers within the structure of a household – and the very term komornik – was ambivalent and dependent on local historic, economic, and legal conditions. It appears that in the feudal reality the degree of this group’s integration with the households in which they lived was relatively low. Considering the diversity of forms of acquiring a means of livelihood by the lodging population it would be difficult to point to an area in which a lodger’s subjection to the authority of the household head would have been fully developed. See also Ruggles and Brower 2003 on the inconsistencies in the registration of unrelated persons, such as lodgers or boarders in the 19th- and 20th-century US censuses.

8 Exemplary cases are: ‘by the same [household head] komornik’ (Orle, region 3); ‘in the same house’ (Raciąż, reg. 3; Dałyń, reg. 8); ‘in the same hut’ (Polajewo, reg. 3); ‘opposite, in the second room, under one roof’(Przyłęk, reg. 6); ‘together with him [i.e. household head] living’ (Luboml, reg. 8); ‘ by the same [household head] staying’ (Gizowszczyzna, reg. 10). In the exemplary record from the village of Parzynów (region 4), one encounters ‘Cottage (chałupa), in it residing: Wincenty Szczepaniak aged 25, Jadwiga his wife aged 22, their children – Jakub son aged 4, Zofia their mother aged 60; in the same hut komornik Michał Kałuża, aged 30, Ewa his wife aged 27, their children – Maryanna daughter aged 4, Katarzyna daughter aged 2’. The jointness of a lodging and householding family could seldom have been more loose in nature, although it would still have consisted of a group of people in broad residential unity (‘in a shed by this hut staying;’ the Połajewska parish, region 3).

9 This group comprised 90 percent of all individuals lacking age information. Over one-third of them came from five enumerations in which age was entirely omitted in the survey; 41 percent were part of another four censuses, each of which left no evidence of age for around half of the population. Finally, 13 percent of the respective persons were assigned to a further four listings in which age registration was missing for between one-fifth and one-third of the populations. Partial omissions of age seem to have occurred for different reasons in the remaining two groups of surveys.

10 Region 9 contributed a 21 percent share.

11 Only very few of English household listings used in the research of the Cambridge Group contained information on age (see Wall 1987).

12 In many of the English household listings from 1801–1831, marital status was provided only for the household heads (see Wall et al. 2004).

13 Over 70 percent of males in this category were co-resident servants.

14 Incomplete information for the lodging population is likely to be common in censuses which were taken by enumerators who may have spoken to only one member of a large household (usually the head) that may have contained multiple family groups.

15 The following example illustrates this point. In the Bujaków parish in Upper Silesia (region 7), in the household of Wojciech Cipa and his wife Jadwiga (the hostess’s maiden name was not given in the listing), the widow Jadwiga Tartas made her living, and was described as ‘lodger.’ However, an analysis of the parish registers revealed that the hostess came from the Tartas family, and that the co-resident widow was actually her mother, prompting a recoding of the residential characteristics of the domestic group and its members: from a nuclear to an extended type of domestic group; from living with non-relatives to living with married child (for lodging mother); and from living with a spouse only to living with at least one parent (for the head’s spouse).

16 In one of the households from 1791 Lesser Poland (the Przyłęk parish in region 6), one of the household’s sons, aged 27, was described as a farmhand and his co-residing wife was described as a lodger. In another parish from this region (Żóraw), the daughter-in-law was described as a maid. In all of these cases, when determining a given person’s household membership priority was given to kinship ties.

17 The phenomenon occurred more frequently only in the Lower Silesian (Sudetian) parishes. There, a substantial portion of the co-residents described as lodgers, Hausleute, or persons on a retirement pension (Auszügler; or im Ausgedinge) were found to have been the household head’s parents after the criteria of surname and age compatibility were investigated. It is also in this region that certain forms of provision for elderly retired parents existed; i.e., the elderly person’s position was made distinctive in a household headed by the younger generation (Szołtysek 2003; 2007; see also Ch.10).

18 The investigation in this case is limited to selected regions from the Polish-Lithuanian western territories because lodgers occurred in larger numbers in these regions only.

19 Redoing the calculation for lodgers and kin aged 50+ increases the value of the coefficient of determination only very insignificantly (R2=0.064).

20 This, of course, is not the case with stepchildren. However, as they are only very rarely explicitly mentioned in our listings, the decision was made to treat them jointly with the group of children.

21 In either case, such a domestic group would be treated as an extended family type 4 according to Laslett’s scheme; equally, in either case, the extended kin would be coded as living with married offspring. It would only matter for the children, as only one of them could be treated as ‘living with parent.’

22 As expected, particularly in regions 4 and 9, but also in regions 3 and 5.

23 All of the persons who could potentially be tied through the parental link, but for whom it was not specified whether they had acquired their kinship through the head or his spouse, were also treated as unproblematic. For example, it was assumed that whenever there was a co-resident mother of either the head or his spouse in the house, together with a brother of either the head or his spouse, they would form one CFU.

24 The latter problem is, however, not relevant to data taken from the Russian revision lists (regions 11N and 11S), in which the genealogical information about women is enhanced by the addition of their father’s names (patronyms) to their own names.

25 If parishes from the regions most afflicted by a deficiency of the registration of kinship relations (regions 4 and 9) were excluded, all of the correlation parameters drop, although the relationship remains statistically significant (r= –.277, p>.001, for R2 = .076).

26 As in the previous case, all of the co-resident relatives other than children, grandchildren, and children-in-law were investigated.

27 It could perhaps be supposed that a substantial portion of the hostesses’ relatives were in fact the misreported relatives of a deceased spouse. Indeed, in Belarusian revisions the sons of the brothers of the deceased heads were recorded as the nephews of female heads; however, an analysis of patronimic names enabled us in the majority of cases to properly establish the interrelationship pointer. In this sense, the numbers included in Table 10 pertain to cases in which there are no solid grounds for doubting the kinship relationships to the female head of the household, as has been established in the source. The meticulousness of the records compiled by some of the census-takers is exemplified by a description of a hearth in one village from the Sudetian region in Silesia (region 7): it was headed by a 35-year-old widow, co-residing not only with her widowed mother but also with the single/unmarried brother of her deceased husband!

28 Overenumeration is assumed to be much less prevalent in historical census-takings. In fact, much of the potential for overenumeration stemmed from age misreporting among certain age groups (the elderly in particular; see sections below).

29 An auditor of magnate estates in central Poland complained in 1787 about the hardships involved in conducting a listing because the ‘peasants, for some inexplicable reason, used to closely guard their children’ (Gieysztorowa 1976, 134, fn. 104). Still, in 1897 Russian peasants used to escape to the woods to run away from being enlisted. In Turkey during the 1940 listing there was a strict ban on leaving home while the enumeration process was taking place (see Rosset 1964, 70–71). Another set of reasons for registration deficiencies were given in P. Őri’s study on the Hungarian conscriptio animarum from the 1770–80s, which was ordered by the Habsburg Monarchy and carried out by clergymen. In this case, the observed 5 percent to 10 percent underregistration was caused not so much by the subjects’ attempts to avoid registration, but by deficiencies in the pre-formulated questionnaires, in which there was no room for some categories of household members, such as relatives, widowed persons, and inmates (see Őri 2003, 124–136).

30 The tendency to focus on these age groups originated from Sundbärg’s observations, who claimed that the discrepancies in the age structure of populations of particular nations may well be reduced to differences in the relative size of the ‘class of children’ (aged 0–14) and the ‘class of grandparents’ (aged 50+) (Sundbärg 1907). In contemporary demographic studies, especially those concerned with age dependency ratios, age 65 is usually seen as the lower limit when defining who is elderly (Hobbs 2008; Rowland 2003; Bongaarts 2002).

31 Rosset (1964, 209–210) collected data from the UN demographic yearbooks up to 1958, and found that there were only two societies with a proportion of the elderly below two percent in the 20th century (Maldive Islands 1946, and Ghana 1952). Nevertheless, he assumed that these levels were theoretically possible under ‘demographically primitive conditions.’

32 The coefficient of the variation is defined as the standard deviation of a variable divided by its mean.

33 However, the above-mentioned proportions of children and youths were not confirmed in any of the urban communities of the pre-partition Poland investigated by Kuklo, in which they only slightly exceeded 33 percent of the total population in most of these settings (Kuklo 1998, 45–46).

34 For the mortality conditions in the pre-transition era (actually, however, on the basis of data from one parish from central Polish lands), Vielrose estimated theoretical (stabilized) age structures in which the share of children between zero and 14 years of age amounted to 42.3 percent for the population, whose intrinsic growth rate was 0.4 percent. In reference to these calculations, Kędelski called a 38.3 percent share of the minors for the Ovruch district in18th-century Ukraine ‘very low;’ perhaps due to an incomplete registration of the children in this district (Kędelski 1990).

35 For the western and eastern voivodships of Poland-Lithuania, the appropriate values amounted to, respectively, 40.1 and 4.8 percent to 39.6 and 3.6 percent. The territorial reach of Szulc’s categorizations referred to Polish lands within the interwar borders (1918–1939), which means that they did not encompass the totality of the eastern regions of the pre-partition Commonwealth. It did not include the region 10 from our database or the majority of region 11N, and approximately half of the 11S region.

36 Life expectancy estimates for historical Polish-Lithuanian territories are discussed in Ch. 5. All of the 18th-century marriage cohorts from Knodel’s study of 14 German villages fit almost perfectly into the age-specific fertility schedules typical of non-controlling populations.

37 Indeed, all of the auxiliary information included in Table 8 comes from the population enumeration of 1789, which preceded the listings of the Civil-Military Commissions included in our database. In contrast to the listings of the Civil-Military Commissions, the enumeration from 1789 was not an individual listing. Census sheets for particular villages were filled in by the owners of the estates. They provided the aggregate numbers of inhabitants, identifying the unmarried male offspring as being below or above the age of 15 (see Rusiński 1970, 74–75; Gieysztorowa 1976, 100–116).

38 It cannot be ruled out, however, that the 1789 materials were inherently flawed, as has already been suggested at several points; see Rusiński 1970, 74–76; Gieysztorowa 1976, 115.

39 Additional clues regarding the potential underregistration of two broad age groups in our western regions (particularly regions 2, 4–5; partly 3) are provided by a comparison of our estimates with data collected by Borowski for the province of Greater Poland at the time when it was part of 19th-century Prussia (Borowski 1963, 89). In 1816 in rural regions of the province, the percentage of people below age 15 amounted to 39.7 percent, and thus was slightly higher than in our 18th-century materials included in Table 8. Our estimates could, however, be defended after we take into account the fact that Borowski’s data came from the time period after the abolition of serfdom – an event which may well have paved the way for the more rapid growth of the province’s population – and thus might not be fully in line with our values. However, this reasoning does not help to explain the share of elderly people (aged 61 and older in Borowski’s account), because these values exceed comparable data from our material. According to Borowski, persons aged 61 years and older constituted five percent of the population of Greater Poland in 1816. All of our regions indicated above displayed lower values, although two of them almost approached that level (region 2 with 4.8 percent, and region 4 with 4.7 percent), which definitely suggests that the census coverage was good, at least in those groupings.

40 Most scholarly interests in co-residence patterns of the elderly pertain to assessing the extent to which this subpopulation shared domicile with grown-up children, i.e. the potential care-providers for the aged.

41 Age heaping is treated here as a source of distortion in age-specific vital (or residential) rates, which need to be removed or at least minimized in order to study the family or household variables. However, age heaping can also be studied as a topic of interest in its own right. Economists and economic historians have been increasingly interested in using age heaping as an indicator of basic numeracy (Stockwell 1966; Stockwell and Wicks 1974; also Kaiser and Engel 1993), and as such, as an indicator of human capital (A’Hearn et al. 2009; Crayen and Baten 2010; De Moor and Van Zanden 2010).

42 Materials originating from the 18th-century urban court rolls from the Polish territories, especially those revealing biographical elements in the evidence presented by suspects of peasant origins, provide unique opportunities for capturing the age awareness of individuals in pre-industrial Poland-Lithuania. Here is how their age was defined before the court by various arrested ‘free persons’ in 18th-century Masovia (east of region 3; see also Ch. 6): ‘I have more than 20 years’ (Turska 1961, 37); ‘How many years I have – I do not know’ (1746; ibid., 61); ‘I have 18 years or more’ (1749; ibid., 67); ‘about 45 years I have’ (1757; ibid., 77); ‘I am years more or less 40’ (1758; ibid., 79); ‘I passed 60 years’ (1761; ibid., 83); I am about 40 years old’ (1776; ibid., 88); ‘I have maybe 30 years’ (1788; ibid., 97); ‘I am a little more than 18 myself’ (1793; ibid., 120). Another phenomenon that has been found to be pertinent to household survey processes in less developed countries is that for a significant proportion of individuals in all age groups, age reports were quite often based on the opinion of certain third parties, such as more literate neighbors or relatives (Ewbank 1981, 5–6).

43 Regions 1 and 9 were omitted as they were composed of relatively small populations. The pyramid for region 11N was also omitted on the grounds that it was almost perfectly represented by that of region 11S.

44 The proportion of individuals with ages concentrated on 30 years within the total population aged 23–62 varied across regions from 3.6 percent to 20.4 percent; and for those with ages rounding on 40 years from 2.9 to 17.9 percent, respectively.

45 The share of people with ages ending with those two categories of digits was almost equal in the aggregate pyramid (29.3 to 28.5 percent).

46 The original Whipple’s index is calculated by taking the ratio of the sum of people reporting an age ending on multiples of five and the total sum of people in the age range 23–62, which in demographic terms is the most stable population group. The Whipple’s index value of 500 would indicate perfect heaping on multiples of five; a value of 100 would indicate no heaping at all. All values below 100 suggest ‘anti-heaping;’ meaning that the population is concentrated on ages that end in neither zero nor five. A Whipple’s index of zero is theoretically possible and would mean an avoidance of ages ending in five and zero. However, values below 95–100 are uncommon. Some limitations of the index are related to the fact that in older age groups frequencies of some categories of age may differ significantly due to mortality effects, even in the absence of age heaping (Hobbs 2008; see also Spoorenberg 2007).

47 From the 220 parishes included in the analysis only two achieved index value between 100 and 104, which is considered as an attribute of highly accurate data. One of these parishes belonged to region 10 in the Ukraine.

48 The ABCC Index reports a society’s share of individuals who probably know their true age (named after A’Hearn, Baten and Crayen as well as Greg Clark, who developed that measure). The formula is


The index ranges from 0 to 100. If everybody reports their correct age, ABCC has a value of 100.

49 In 1918 in Albania, the Whipple’s index for males was 271 and 373 for females (Gruber, personal communication).

50 The gender equality index in numeracy (GE) was developed in Manzel and Baten (2009). It is defined as


– where whf and whm stand for the Whipple’s indices of females and males, respectively. The higher the measure of gender equality, the lower the share of women relative to men rounding their age up or down in a certain country. A positive (negative) gender equality index implies a female (male) numeracy advantage. The respective index for 1918 Albania is –36.8.

51 Following a methodology set out in Crayen and Baten (2010), in the figure we extended a usual population reference group used for the calculation of the Whipple’s index so as to include those aged 63 to 72.

52 Societies with a higher level of age heaping may also have more age exaggerations in older cohorts (Manzel and Baten 2009, 46). This pattern is partly confirmed in our data. We did not observe more elderly people in our eastern regions, but we did find more centenarians in absolute terms there (37 people compared to 11 in the west).

53 Unlike the Whipple’s index, the Myers’ index takes into account the preference and avoidance of all 10 digits. The procedure creates a ‘blended’ population, which is essentially a weighted sum of the number of persons reporting ages ending in each of the 10 terminal digits. In the statistically ideal population, free from systematic irregularities in age reporting, the ‘blended’ sum at each digit should be approximately equal to 10 percent of the total ‘blended’ population. The method yields an index of preference for each terminal digit. If the sum at any given digit exceeds 10 percent of the total ‘blended’ population, this indicates an overselection of ages ending in that digit (i.e., digit preference). Conversely, a negative deviation (or a sum that is less than 10 percent of the ‘blended’ total) indicates an under-selection of ages ending in that digit; that is, digit avoidance. An overall measure of the extent to which there is digit preference and/or avoidance in a census age distribution is the summary index of preference which is derived as one-half of the sum of the deviations from 10 percent, each taken without regard to sign. The latter index presents an estimate of the minimum proportion of persons in the population for whom an age with an incorrect final digit is reported. The theoretical range of the Myers’ index is zero (representing no heaping), to 90, which would occur if all of the ages were reported at a single digit (Myers 1940; 1954; Hobbs 2008, 138–139).

54 In 1897 in Russia the value of the Myers’ index was 13.9 (Rowney and Stockwell 1978, 223); in 16 out of 22 African populations around 1960 it was below 11. Only in Turkey (1960), Iraq (1957), and Morocco (1960) did the index approach higher values (respectively, 22.3, 26.7, and 38.3; after Nagi et al. 1973).

55 Populations from region 7 seem to be devoid of distortions caused by any form of digit preference. On the other hand, groupings 8 and 10 must have suffered from deviations related to digit preference on years other than just multiples of five (especially digit one).

56 The sum of the deviations at ages two, three, and four is 10.9. By contrast, the sum of the deviations at ages seven, eight, and nine is 9.6. Inversed deviations at digits five and six are so close to each other in absolute terms that they seem to warrant the assumption of mutual exclusiveness.

57 These patterns are largely corroborated by calculations of the Whipple’s indexes for corresponding household membership groups. For the aggregate collection, the Whipple’s indexes for heads and non-kin co-residents were 221 and 201, respectively. If we investigate the problem separately for western and eastern parishes, then the respective values of the index will be 184/194 in the former, and 250/223 in the latter.

58 The advantage of this technique is that it can illustrate the unique effect of each specified characteristic by holding still or controlling for the effects of others (see Pampel 2000).

59 A broad geographical division into western and eastern parishes was preferable in the regression over a division into ‘regions’ due to the strong multicollinearity of the regional divisions with the type of census variable. Regions 3–5 were, for example, composed of parishes surveyed with only one specific type of listing, as was the overwhelming majority of region 7.

60 Again, age turns out to be crucial in this regard, which is not surprising since non-kin co-residents were on average younger than household heads.

61 Differences in household structures may correspond to different patterns of authority and power relations within domestic groups (Todd 1985). As such, they could be indicative of potential differentials in the extent to which the authority of a household head could play a crucial role in the very process of the age reporting of household members.

62 In the modeling process, the type of census and household status had the strongest immediate impact on the effect of domestic group structure, although each of these control variables worked in opposite directions. The first reversed the relationship between the ‘multiple family’ predictor and the outcome variable into a significant negative association, whereas the second made it positive.

63 The reason for running separate regressions instead of including the new variable in the standard dataset was that the new model required additional exclusions of groups of individuals. All of the household heads were excluded from calculations on the grounds that they could not be a reference category for themselves in the modeling process, which decreased the number of cases to 63.444 individuals. The Index of Dissimilarity for the age composition of the original population and the abbreviated one was 11.7, which demonstrated that the overall variation between the two datasets was not particularly substantial (the index represents the proportion of people who would have to move to a different age category to make the distributions identical; it can vary from zero to 100. Any index that is less than 10 indicates that their distributions are similar; see Rowland 2003). For each person two categorical variables were created that indicated the age reporting status of the head of household to which this person belonged. Simple cross-tabulations seemed to predict a strong effect of the new variables on the probability of age rounding. On average, 39 percent of people belonging to households in which the head had an age ending in zero had their ages rounded to zero, compared to only 21 percent of those with a heads who stated his age differently; and the respective figures were higher for digit preference in multiples of five (53 to 31 percent).

64 With reference to the notorious hardships encountered in the surveying processes in contemporary developing countries, Ewbank noted as follows: ‘In particular, the training of interviewers, their level of education, and their ability to understand and pursue the interests of the researcher will significantly affect the quality of data [on age]’ (1981, 15). Also Szulc 1920, 8.

65 This pattern is well illustrated with traditional age heaping indexes. The Whipple’s index for the ‘other’ type of listings very closely approaches the ‘fairly accurate’ standard of contemporary census microdata (114). In the Russian revision lists this value is almost three times as high (320), in the status animarum it is twice as high (237), and in the Commissions’ list it is almost 50 percent higher (168).

66 Kumor (1969) and Kotecki (2009) paid particular attention to the visible extension of the control of pastoral activities in church inspections of the 18th century in comparison to the control levels in the preceding century.

67 There exists in literature a multitude of records registering the problem of the sequestration and the redistribution of confiscated properties during the reigns of Catherine the Great and Paul II, when such practices were used as a means of punishment against gentry found to be insufficiently loyal to the new Russian governors, or as a penalty for participating in anti-Russian political movements of the 1790s. See review in Rychlikowa 1991, esp. 61–63.

68 In this context it is interesting to note that, according to Kabuzan’s observations (1963, 131–142), on the territories of the former eastern Poland-Lithuania which was incorporated into Russia following the partitions (Lithuanian, Belarusian, and right-bank Ukraine’s provinces), the underregistration rate remained at the highest level well into the 1830s. Of the overall number of persons omitted in the 5th Revision, nearly 50 percent were in territories of Lithuania and Belarus incorporated into Russia following the partitions of the Commonwealth. Data from the following two revisions indicated underregistration rates ranging from 17 percent in Minsk province to 28 percent in Vilno province (ibid, 139–140). If, as Kabuzan has argued (1963, 142), there were more opportunities in these regions to avoid the listing than in other territories of the European part of Russia, then the mechanisms of control over the course of the enumeration process must have been decisively weaker in those areas, as well. On the insufficiencies of the technical basis necessary for the successful execution of the Russian census of 1897, see Rowney and Stockwell 1978. Jasas and Truska 1972, 15–16, and Błaszczyk 1985, 112, also commented on the superiority of the statistics applied in the Commissions’ listings over the Russian revision materials.

69 The Cataster (Kataster) was a fiscal census of a very broad territorial scope which embraced the entire Prussian state, all of the rural settlements within those territories, and within them, all of the households (see more in Cackowski 1967).

70 Age ratio is defined as 100 times the number of persons in a given age class divided by the arithmetic average of numbers in the two adjoining age categories. Age ratios should normally deviate very little from 100, except at advanced ages. In general, any considerable fluctuations in age ratios indicate either inaccuracies in age reporting or incomplete enumeration (United Nations 1952, 60; 1955, 39).

71 The inspection of the approximated 10-year birth cohorts reconstructed from the Commissions’ census counts (1791) and the Russian revisions (1795) – which together contained some 80 percent of the population of the database – shows very clearly that both census populations were relatively young (see Figures 3–4 in Ch. 4).

72 Deviations of age ratios from the standard values constitute rough indications of the degree of inaccuracy in a given distribution, provided that certain factors, which ‘normally’ disturb the regularity of the age structure, are absent. The assumption of an expected value of 100 implies that coverage errors are about the same from age group to age group, and that age reporting errors for a particular group are offset by complementary errors in adjacent groups. The lower the age-accuracy index, the more adequate the census data on age would appear to be (Hobbs 2008, 148).

73 In the present context, we define the sex ratio (often called the ‘masculinity ratio’) conventionally as the number of males per 100 females in the same age class (Shryock and Siegel 1976, 106–7; Hobbs 2008, 130).

74 In the absence of intervention, an excess of male births (between 104 and 107 males for 100 newly born females) is well documented as a biological phenomenon. Süssmilch devoted a full chapter (§ 409) of his book ‘Die Göttliche Ordnung’ (1761) to sex ratios at birth, in which he first established the average of about 1.05 for European populations. Human longevity is also strongly influenced by gender, as, given similar care, women generally have better survival rates at all ages (Visaria 1967; Waldron 1983).

75 It would be difficult to argue that higher sex ratios in these groupings (all of which refer to rural and serf populations) could be principally due to the greater migration of females to cities (see the discussion below).

76 This ratio approximates the ideal proportions only in regions 10, 3, and 1, which display the sex ratios of 106, 107, and 101, respectively. In the remaining ones, a considerable deficit of males at ages 0–4 is noted (a total of seven regions both east and west). This seems even more alarming than the otherwise clear prevalence of relative number of boys in two other groupings (which does not rule out underestimations of females in their case as well).

77 On the lands of the Polish-Lithuanian Commonwealth the compulsory military conscription of males aged 18 to 35 was passed in December 1789, i.e. two years before the census-taking action conducted with the help of the Civil-Military Commissions (Kopczyński 1998, 76). A close association of the enumeration system with military conscription also existed for the Russian revisions. The main goal of these latter head counts was to determine the number of individual taxpayers, but the lists were also used for estimating army reserves. A special order regularly set the number of men who would be recruited in a given year, but normally the recruits were chosen from among men aged 19 to 35.

78 The AGESMTH application, which is contained in the U.S. census Bureau spreadsheet program called Population Analysis System (PAS), was used to obtain smoothed distributions through the Strong Method (<>; see also Arriaga et al. 1994). Note that the strong smoothing formula does not smooth the youngest and oldest age groups. In its most general sense, the term ‘smoothing’ is used in demography to denote the elimination or minimization of irregularities often present in reported data or in preliminary estimates obtained from them. Various ‘smoothing techniques’ encompass a wide variety of procedures, ranging from the fitting of models to simple averaging (Smith 1992, ch. 2).

79 It is important to bear in mind that after the western cluster reaches its peak in the age group 35–54, the tendency lessens; and although the trend remains at levels that are almost certainly too high, it does not depart from the theoretically expected values as dramatically as in the case of eastern populations. Also, the tendency to downplay the share of females aged 35 or older appeared with greater consistency across all parishes. In 192 out of 220 parishes the sex ratios were above 100.

80 According to the hearth tax lists and censuses of the Grodno district population from 1789, there were more men than women aged 31–60 (Czyżewski 2009).

81 Radziejów was one of the major urban centers of region 3 in our database, whereas Wieluń performed a similar function in region 5; Olkusz was a small urban center in the western part of region 6, and Praszka, in region 5.

82 The value of the score shows the degree of the average deviation of the age-specific sex ratio (measured in persons per 100) from the sex ratio of the preceding age group (Poston 2006, 44). The assumption behind this approach is that these age-to-age changes should ideally approximate zero (Hobbs 2008, 150).

83 The respective figures in Table 17 were calculated using the U.S. Census Bureau Population Analysis System (PAS) spreadsheet program called AGESEX (see <>).

84 In fact, however, the index has not been accepted uncritically as a measurement of the degree of inaccuracy of age statistics by the United Nations (1952, 71–77) or by the authors of more recent demography guidebooks (Hobbs 2008, 150–151).

85 This adjustment technique is an in-built feature of the AGESEX spreadsheet (see ft. 83 above). It makes allowances for population size, and thus enables us to minimize randomness in the irregularities of the numbers in various sex and age groups. The rationale, as well as the mathematical formulas for the calculation of the adjustment, are given in United Nations 1952, 72, 77–79.

86 For example, Ghana 1960 (114); Morocco 1960 (157) (see Ewbank 1981, 22–23; Ohadike and Tesfaghiorghis 1975, 23). The value of ASAI for 1918 Albania was 160 (adjusted value: 158) (personal communication with S. Gruber).

87 While these population listings are not free from errors either, it is reasonable to assume that the population age-sex distribution they reveal should have better coverage and attain greater precision than the estimates we arrived at on the basis of 18th-century enumerations. Thus, they could provide a convenient background against which we can compare our own schedules. We should keep in mind that the numbers compiled by Szulc cannot be considered as a strict equivalent of census returns. To reconstruct the populations of Polish lands from the turn of the 19th and 20th centuries, the author made use of published statistics derived from three different census enumerations: the Russian census of 1897, the census of the German Empire from 1900, and the census of the Habsburg Monarchy from 1900 (Szulc 1936). The territorial reach of Szulc’s estimates does not cover Poland in its strictly pre-partition borders, but within the boundaries from around 1938; that is, with the exclusion of a substantial portion of the eastern regions for which we possess data from the 18th century (especially the Ukrainian regions).

88 This explains why, as reported in Table 17, the sex ratio scores are almost identical for the two populations (around 10).

89 Unless the sons omitted from the registration were in fact the only offspring co-resident with the parental generation. As far as household structure is concerned, however, the worst-case scenario would entail shifts within one category of domestic group structures (nuclear with or without offspring). More troublesome would be the implications for the study of dyadic relationship patterns, as the observed deficiencies would artificially increase the proportion of the elderly not living with at least one son. Partial relief in this regard is brought about by the nearly equal degree of underregistration of young men in the western and eastern areas of Poland-Lithuania (see Figure 26).

90 Caselli and Vallin (2006, 27) drew attention to the ease with which older and widowed women in old African provinces of France escaped registration as recently as in the 20th century.

91 Normally, standard age distribution can be constructed by computing a stable population from the Lx values of a suitable life table by applying an appriopriate rate of geometric growth. By fitting the values from the reported age distribution into the model stable population pattern we arrive at an age distribution that retains some of the features of the reported distribution, and is at the same time free from obvious bias (United Nations 1983, 244 ff; 1955, 36–39). On the applicability of stationary or stable population models to micro populations from historical Poland, see Ch. 5.

92 The Coale-Demeny model ‘East’ was calculated on the basis of 33 life tables from Germany (1881–1890, 1949–1951), Austria (1900–1901, 1949–1951), Poland (1931–1932), Czechoslovakia (1949–1951), but also North Italy (1921–1922). It is characterized by high mortality rates in infancy and increasingly high rates over the age of 50 (Coale and Demeny 1983, 11–12).

93 <>

94 Both parameters critical to a choice of an appropriate life table – life expectancy at birth and the increase rate – remain a subject of debate among experts of the demographic ancien régime on Polish-Lithuanian territories (see Kuklo 2009, 416–417). The proposed life expectancy values should be understood as the most plausible approximations in light of the currently available evidence. The intrinsic growth rate of 0.5–1.0 percent matches the values suggested by some researchers of the Polish demographic past (Vielrose 1957, 12; Kuklo 2009, 247), and generally corresponds to the range of values arrived at through the microsimulation model discussed in Ch. 5.

95 Arriaga’s formula assumes that a second-degree polynomial passes by the midpoint of each of three consecutive 10-year age groups and then integrates a five-year age group.

96 Although it might be possible to assess a general trend in the direction of change in age heaping among our regional populations, imputation techniques would always be arbitrary in assuming a tendency to move upwards or downwards in the age reporting of a given individual.

97 This categorization (e.g., 28–32) would approximate much better a real distribution of individuals with ages more than two digits away from the most overrepresented years. However, it seems very likely that even in this grouping a portion of the individuals who in reality would belong to the subsequent age groups based on odd numbers (e.g., 33–37) would be artificially included.