GitHub - mariuszoican/MartineauZoican2021_InfoContribution: Data and code for Martineau and Zoican (2021): building an information contribution measure for sell-side analyst reports

How to run the NLP code?

First we train the model using: \train_models\LDA_model_estimation.py or LDAMarius.slurm if using a multi-core system.
- The model files are saves under .\pretrained_models
- Estimation logs are saved under .\train_models\Logs
- Perplexity scores are saved under .\train_models\Output
Second, we run the \train_models\study_topics.py file to plot perplexity against the number of topics and select the optimal topics.
- Output is a graph, perplexity_topics.pdf.
- The file also outputs the top ten words for each topic, given a (manually) prespecified number of topics X: topics_terms_n=X.pdf.
The file industry_gettopics.py generates a quarter-industry panel of topic loadings, saved in the file: .\IndustryAnalysis\topic_loadings_by_industryquarter.csv'
Code .\IndustryAnalysis\industry_toptopics.py generates the top 2 topics (with list of words) for each GIC code and saves in TopTopics_Industries.csv.
Code build_shapley.py (together with ShapleyMarius.slurm) generate panels of Shapley values by analyst-ticker-quarter (including information diversity, contribution), saved in OutputShapley folder.
Use merge_shapley.py in the OutputShapley folder to generate a DataShapley.csv file.
Run get_technicaldummy.py to get a file with analyst-level topic loadings on technical analysis topics (`DataShapley_TechnicalTopicWeights.csv')
The complete merged file (DataShapley.csv + DataShapley_TechnicalTopicWeights.csv') is saved as Data_InfoContributionAnalyst.csv'

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
IndustryAnalysis		IndustryAnalysis
OutputShapley		OutputShapley
pretrained_models		pretrained_models
train_models		train_models
.gitattributes		.gitattributes
DataShapley.csv		DataShapley.csv
DataShapley_TechnicalTopicWeights.csv		DataShapley_TechnicalTopicWeights.csv
Data_InfoContributionAnalyst.csv		Data_InfoContributionAnalyst.csv
ShapleyMarius.slurm		ShapleyMarius.slurm
build_shapley.py		build_shapley.py
get_technicaldummy.py		get_technicaldummy.py
readme.md		readme.md