Keywords

1 Introduction

On a social media websites such as Twitter, there is a wide range of information associated with users, including profile description, profile picture, home location, past tweets, and follower-following connectionsFootnote 1. However, such platform often does not explicitly provide user attributes such as gender, age, and various other user characteristics. Hence a number of existing works have proposed methods to predict these attributes [1, 6, 8]. Recently, detecting special minor groups of users has also attracted considerable attention. For example, dedicated approaches have been proposed to detect users who suffer depression or users who are in various phases of drug addition [2,3,4]. With regard to parenting, Morris investigates usage of social media such as Twitter and Facebook by mothers of young children [5], which provides findings related to our work.

In this paper, we present a solution for automatically detecting mothers who have baby children on Twitter. Similar to Morris’ study, we define a baby as a child that is less than three years old [5]. As Morris’s study shows, mothers with baby children tend to often actively seek information and advice from baby caring experts, facility providers, and other mothers in the same situation. We can devise various applications in the face of such needs. For example, we can generate information on which streets are frequently used by other mothers with young babies by detecting baby mothers in social media, and this information can be then used for specialized route recommendation. The pilot interview we conducted with baby mothers within our research group revealed that, baby mothers are willing be provided with such applications. In general, our method for automatic detecting mothers having babies will be beneficial for mothers who wish to contact other mothers in similar situation, and for experts, as well as organizations who aim to provide targeted and on-time information.

In the existing works on Twitter users classification, the most often used information are users’ past tweets [6, 8, 10]. However, as Morris has found in her study, baby mothers who use Twitter often do not share their baby information in tweets, except for profile pictures [5]. Based on our investigation, a Twitter user who is a mother to a baby would nevertheless seek baby-related information by following expert and consultant accounts. Therefore, instead of using information contained in users’ tweets as in the prior works, we focus on the accounts a user follows. We devise a set of new features based on followed accounts and some other information. We manually select some baby mothers and non-baby mothers from real Twitter users for training and testing our prediction model. According to experimental results, the classification accuracy using our proposed features is higher than using the established feature sets.

2 Related Work

In recent years, discovering latent Twitter user attributes has become an important research topic. For instance, Rao et al. [8] propose a machine learning approach to discover gender, age, place of origin, and political orientation based on features extracted from users’ past tweets. Pennacchiotti and Popescu [6] propose another machine learning approach to discover political affiliation, ethnicity, and to find Starbucks fans among Twitter users based on profile, tweet behavior, and tweet contents. Sharma et al. [9] introduce a method to infer keywords associated with the biographical information and expertise of a user. However, it is unclear whether these generic methods can be effective in our task of discovering baby mothers. We compare this generic method with our own approach in terms of classification accuracy.

Recently, detecting special groups of social media users who are in need also attracts researches. MacLean et al. study users with drug addition problems in a online forum [4]. Their focus is on detecting users in different phases of drug addition, including using, withdrawing, and recovering. De Choudhury et al. study Twitter users who suffer depression [3]. They propose a detection method based on several signals including tweet posting activities, social network graph, lexicon-based emotion analysis, linguistic style, and special words used by patients. Among several classifiers, they found that Support Vector Machine (SVM) provides best prediction results. The authors make a similar study of mothers who recently gave birth [2]. The features they used include engagement, network, emotion, and linguistic style. Their prediction mainly focuses on detecting changes occurring between the time of prepartum and postpartum, and it is unclear if it is effective for detecting baby mothers among different types of users. We also compare that method with our own approach. Nevertheless, to the best of our knowledge, our work is the first proposal of categorizing SNS users according to whether they are mothers to a baby.

3 Features for Finding Baby Mothers

We follow the approach for detecting latent attributes of social media users using machine learning, which include devising features, labeling examples, and testing models. Our main contribution is that we not only propose a novel task but also introduce specific novel features effective for detecting mothers with babies. In this section, we discuss the features we consider in our framework, which include existing and novel features. The data preparation and learning models will be discussed in the next section.

3.1 Existing Features

Bag-of-Words on Tweets (BOW). The simple presentation of users’ text as term frequency has been a baseline method in several user classification works, including author profiling task in PAN competitions 2017 [7]. Following [7], we use BOW of 1,000 most frequent terms to represent a user’s tweets.

Socio-Linguistic. We study a feature set for generic attribute detection. Proposed in [8], this feature set is used with SVM for detecting gender, age, regional origin and political orientation. The entire feature set is listed in Table 1. Most of these features are based on lexicons. Some lexicons are pre-defined, others are generated from the data (e.g., possessive bigrams). The feature value indicates the count of occurrences of the lexicon words in the user’s tweets.

Table 1. Socio-Linguistic features for generic user attribute detection

Postpartum. The second feature set we use is specialized for detecting behavior of new mothers. Proposed in [2], the feature set includes posting behavior, ego-network, and linguistic style. The complete feature set is listed in Table 2. Designed to distinguish standard and extreme behavior changes for new mothers, this is the closest feature set we find to our goal of detecting baby mothers. This feature set considers tweeting activities such as retweeting, mentioning, and linking, as well as simple social network features such as the number of followers and followees. The main part of the feature set, however, is writing style analysis based on LIWCFootnote 2 and ANEWFootnote 3 lexicon.

Table 2. Postpartum features for detecting postpartum behavior of new mothers

3.2 Novel Features

Interest Keywords. A baby mother will have a strong tendency to seek information about health advices, baby-related event news, and experiences of other mothers. We consider that it is not what the user tweets about but what she wants to read that provides the hint on the user interest. In Twitter, a user will see most often the tweets posted from the accounts that they follow. So instead of the tweets a user posted, we are interested in the accounts a user chose to follow. Our first feature set is based on the descriptions of accounts followed by a user. Specifically, we generate tf-idf scores for the words used in the followed account descriptions. To use tf-idf, we first extract frequently used keywords from all account descriptions we have in the data, resulting in a list of m keywords. Then we concatenate descriptions of all accounts a user followed as one document. Last, we generate a tf-idf vector of length m. In our experiments, we set the frequency thresholds to 100, resulting in a keyword list of 1,370 words. We find that different threshold values produce similar classification results.

Mom-Words. We generate a list of words likely to be used by baby mothers, not necessarily baby-related. For this, we choose a popular online forum about parenting of baby children, and we collect messages posted in it. The forum we selected is MomForum.com, which is a dedicated sharing platform for discussing parenting and baby-related issues. We fetch 100 threads from one of the sub-forums that discusses parenting of 0–2 years old babiesFootnote 4. We then tokenize and generate a list of frequent keywords from these threads using a frequency threshold of 5 and removing stopwords. As the result, we obtain a lexicon of 299 words, including child, feeding, dad, etc. Based on this lexicon, we generate a Mom-word usage feature vector for each user, where the i-th element is the frequency of i-th word in the lexicon occurring in the user’s tweets.

Pictures. We also deploy the latest image processing techniques to extract information from user profile pictures and generate a picture feature vector. We use a free online APIFootnote 5 that has the capability to output the age and the gender of people shown in an image with relatively high accuracy. For the particular application of discovering baby mothers, we are most interested in whether the picture shows a baby. We generate a vector that includes four binary values, each indicating whether the profile picture contains a baby (\(age \le 3\)), a young child (\(3 < age \le 6\)), an adult woman (\(20 < age \le 50, gender = female\)), and an adult man (\(20 < age \le 50, gender = male\)). Note that there can be several people of different age and or gender present in a picture.

4 Experimental Analysis

We conduct an empirical study by testing our feature sets on a number of real Twitter user data. We aim to find out the effectiveness of our feature sets in comparison to the established user-classification feature sets. In this section, we present the experimental setup and discuss the results.

4.1 Dataset Preparation

We collect from Twitter a number of user profiles, and manually label them according to whether they belong to baby mothers or not. Since the proportion of baby mothers is small among all Twitter users, instead of randomly collecting user accounts on Twitter, we use some heuristics to select candidates before labeling. First we find some attraction Twitter account based on several handpicked Web articlesFootnote 6. These accounts post mother-related information, and should contain a large portion of baby mothers in their followers. Using Twitter API for searching followersFootnote 7, we collect all followers from the picked accounts, including their account names, descriptions, and the numbers of followers and followees, resulting in the set of 18,536 users and their data. From these users, we first remove ones who have more than 200 followers. This considers both Twitter API limitFootnote 8 and the practicality of our solution, because celebrity users who have a large number of followers cannot be considered as typical users, and will distort the classification model.

We next extract a number of very likely candidates from remaining users whose profile description contains keyword “mother” or “mom”. Note that this filtering does not bias the classification because features used for building classification models do not consider these profile descriptions. Then from the followers of the very likely candidates, we find a number of less likely candidates, for which we do not use any filtering. The intuition is that the followers of a baby mother might include other baby mothers, among family and friends. Finally, we manually label positive and negative examples from these very likely and less likely candidates, by looking at all available user information, including profile description, picture, and past tweets. For very likely candidates, we have 197 positives and 59 negatives, while for less likely candidates, we have 106 positives and 583 negatives. In total, we have 303 positives and 642 negatives.

4.2 Learning Model and Evaluation Matrix

Rao et al. finds that the Support Vector Machine (SVM) with linear kernel provides the best classification results with Socio-Linguist features [8], while De Choudhury et al. finds that SVM with radial kernel provides the best result with their postpartum features [2]. As such, in experiment results, we show SVM with linear kernel results, except for feature sets that include postpartum features, for which we use SVM with radial kernel results. We also investigate other machine learning models including Naive Bayes, Random Forest, Linear Discriminant Analysis, and Logistic Regression, and find that Random Forest (RF) generally provides the best results and performs better than SVM. Thus, we show the results for SVM and RF. We use the SVM and RF implementation in R package e1071Footnote 9 and randomForestFootnote 10.

We measure the precision, recall, and f-value for the positive prediction results for the classification accuracy. The f-value is calculated as \(\frac{2 \times precision \times recall}{precision + recall}\). We apply three-fold cross-validation that uses two parts for training and one part for testing. For random forest, we run the experiment ten times and show the average results.

4.3 Results and Discussion

We first test the effectiveness of each individual set of the proposed features, namely, interest keywords, mom-word in tweets, and picture analysis. This result is shown in Table 3. We see that interest keyword provides the highest accuracy with random forest classifier, reaching f-value of 0.579. We also see that the Picture feature set achieves the highest precision with SVM, but has a very low recall. This is reasonable, because if the profile picture contains a baby child, the user would very likely be a baby mother. However, a large portion of baby mothers do not show their babies on profile pictures. Similarly, mom-word features reach a high precision but a low recall with random forest, because if a user uses special words she is likely to be a mother, but not all mothers make mother-related posts.

Table 3. Classification accuracy of individual sets of proposed features

Next we compare the proposed feature sets with baseline feature sets, namely BOW, socio-linguistic (SocLing) and postpartum behavior feature (Postpartum). The results are shown in Table 4. In this experiment, the proposed feature set is the combination of all three feature sets discussed above. As we can see from the results, with SVM, the proposed features reach f-value of 0.562, more than 7% higher than those reached by BOW, SocLing, and Postpartum. With random forest, the proposed features reach the highest precision of 0.785, and the highest f-value of 0.623, more than 15% higher compared to those reached by three baselines.

Table 4. Classification accuracy comparison of existing and proposed features
Table 5. Effects of combining existing and proposed features

We are also interested in whether combining the proposed features with the established feature sets improves the accuracy. The results are shown in Table 5. Comparing this table to Table 4, we see that by adding the proposed features, all baseline features achieve higher accuracy. It improves f-value of BOW from 0.472 to 0.609, for SocLing from 0.469 to 0.617, and for Postpartum from 0.402 to 0.613. Combining all features together, we achieve a precision of 0.808, the highest among all cases we tested. However, the achieved f-value of 0.587 is lower than that achieved by using the proposed features alone.

5 Conclusion

In this paper, we study the problem of finding mothers with baby children on Twitter. Following a supervised machine learning approach, we propose novel features based on followed accounts, vocabulary used, and profile pictures. Experimental results with real Twitter user data show that the proposed features are highly effective for baby mother discovery, and achieve considerably higher classification accuracy compared to three baselines of established feature sets. In future we plan to extend our approach on other user groups such as people with disabilities or rare diseases.