Questions tagged [parsing]
33 questions
14
votes
2 answers
Sentiment data for Emoji
For experimenting we'd like to use the Emoji embedded in many Tweets as a ground truth/training data for simple quantitative senitment analysis. Tweets are usually too unstructured for NLP to work well.
Anyway, there are 722 Emoji in Unicode 6.0,…
Erich Schubert
- 341
- 3
- 8
8
votes
3 answers
Is parsing files an application of machine learning?
I presently receive files from a device in a semi-csv format. I have a written a simple recursive descent parser for getting information out of these files. Every time the device updates firmware, I have a new version of the parser for the changes…
Myles
- 183
- 1
- 5
5
votes
1 answer
semi-structured text parsing using machine learning
I am looking for a method to parse semi-structured textual data, i.e. data poorly formatted but usually having a visual structure of a matrix which may vary a lot in content and number of items in it, which may have headers or not, which may be…
mic
- 533
- 7
- 15
4
votes
2 answers
How to extract important phrases (which may contain company name) from resume?
I have thousands of CV / resumes with me. We want to build a parser which can extract company names from resume.
So far we have tried
Maintained a list of common words present in companies (Eg. Org, Ltd, Limited, Technologies etc.) and use them to…
khirod
- 141
- 1
- 4
4
votes
4 answers
What machine learning algorithms to use for unsupervised POS tagging?
I am interested in an unsupervised approach to training a POS-tagger.
Labeling is very difficult and I would like to test a tagger for my specific domain (chats) where users typically write in lower cases etc. If it matters, the data is mostly in…
Tido
- 193
- 10
4
votes
2 answers
Information extraction with reinforcement learning, feasible?
I was wondering if one could use Reinforcement Learning (as it is going to be more and more trendy with the Google DeepMind & AlphaGo's stuff) to parse and extract information from text.
For example, could it be a competitive approach to structured…
mic
- 533
- 7
- 15
3
votes
1 answer
Can metadata be used to adapt parsing for an unescaped in field use of the delimiter?
I have data coming from a source system that is pipe delimited. Pipe was selected over comma since it was believed no pipes appeared in field, while it was known that commas do occur. After ingesting this data into Hive however it has been…
Chris Simokat
- 131
- 3
2
votes
0 answers
How to write a simple rule-based datetime range parser in python?
The dateparser package fails to detect texts like the following and generate a date range 'last 2 weeks of 2020': Should return 18th December 2020 - 31st December 2020 'first three quarters of 2018': Should return 1st January 2018 - 30th September…
Zing
- 21
- 1
2
votes
1 answer
parse pdf into Json or Xml
I want to create a neural net that can obtain some specific words from a pdf document into JSON or XML. For example let's assume that I have a pdf containing some information about countries and i want to recuperate the countries name and population…
H.Mateur
- 21
- 2
2
votes
2 answers
Parsing data from a string
I think this is something that experienced programmers do all the time. But, given my limited programming experience, please bear with me.
I have an excel file which has particular cell entries that read
[[{"from": "4", "response": true, "value":…
Juanito
- 105
- 1
- 8
2
votes
1 answer
Passing variables and values from an R script to a shell script
I'm working with a shell script(#!/bin/sh) and I wanted to know if there is a way to call variables with their values from an Rscript that I have called in my Shell script.
If that doesn't make sense I want to create, for example a data frame…
Ka_Papa
- 129
- 3
2
votes
1 answer
regex for resume parsing
I am using regex to extract specific sections from resumes, such as key skills, summary, and work experience. The approach involves:
First, I extract the text from the resume based on predefined sections (e.g., "Skills," "Summary," "Work…
Arfa Ahsan
- 21
- 2
1
vote
1 answer
How to build parse tree with BNF
I need to build parse tree for some source code (on Python or any program language that describe by CFG).
So, I have source code on some programming language and BNF this language.
Can anybody give some advice how can I build parse tree in this…
Simplex
- 171
- 1
- 6
1
vote
0 answers
Fastest way to parse regex in R
I need to parse around 1.6k REGEX expressions such as the pair I am writing below.
I have also around 7k documents (1/2 page long each in average) that need to be parsed according to the REGEX expressions.
Right now I am…
Luisda
- 31
- 1
1
vote
1 answer
What is ChunkParserI in nltk.chunk ? What exactly it has been called for?
from nltk.chunk import ChunkParserI
from nltk.chunk.util import conlltags2tree
from nltk.corpus import gazetteers
class LocationChunker(ChunkParserI):
def __init__(self):
self.locations = set(gazetteers.words())
…
Payal Bhatia
- 161
- 7