3

I'm looking for a reference or point in the right direction since I'm not too familiar with machine learning or algorithms. I'd prefer to work in R, but I could also do Python. Any reference to a specific package I could use would be great.

So I have a list of Senate bills and their descriptions that are one or two sentences long. I have a subset of bills that also have accompanied bill types (ie "budget", "immigration", etc). I want to create an algorithm that will assign a bill type(s) to the other subset of bills that don't have bill types.

I would want to be able to do this by using the underlying relationship between descriptions and bill types for the first subset and apply it to the second so that I can predict bill type from bill description. Does this make sense? Any advice would be welcome!

DCK
  • 31
  • 1

1 Answers1

2

Yes, This absolutely makes sense. This is a common NLP (Natural Language Processing) problem. You should use word embedding models alingside LSTM (Long Short Term Memory) and deep neural networks. Actually, first you should represent every word as a vector of fixed dimension (e.g. 100) using word2vec. Then you should build a deep neural network architecture. The inputs of this network are your word vectors for each bill that are concatenated to form a sequence of vectors. The output of this network is a label that indicate which type of bill you are considering. I highly recommend that you use Keras which is a great python package for dealing with deep learning and NLP. Also, you can use existing word2vec sets that contain vector representations of most of words of a language (e.g. English). For example you can use GoogleNews word2vec set.

Ryan Ghorbandoost
  • 1,266
  • 2
  • 8
  • 17