
Hi,
I am using java regular expression to merge using underscore consecutive capatalized words e.g., "New York" (after merging "New_York") or words that has accented characters in between. My pattern works fine if the match is found in the middle of the input string e.g., Near left bank of Rio Amazonas.
but if the string is "On Rio Amazonas" the using my patterns i get result as "OnRioAmazonas". Whant i want to know from you is that is it possible using regular expression to avoid first word in the sentence so that i dont need to worry about first word starting with capital letters in a sentence. I am using following patterns:
if (description.length() > 0) { Pattern p1 = Pattern.compile("\p{Lu}\p{L}+(\s+\p{Lu}\p{L}+)+"); // original pattern Pattern p2 = Pattern.compile("(\p{Lu}\p{L}+)\s+de | das | da | do | dos\s+(\p{Lu}\p{L}+)+");
Matcher m1 = p1.matcher(description);
while (m1.find()) {
String oldStringG1 = m1.group();
String newString = oldStringG1.replaceAll(" ", "_");
description = description.replace(oldStringG1, newString);
m1 = p1.matcher(description);
}
Matcher m2 = p2.matcher(description);
while (m2.find()) {
String oldStringG1 = m2.group();
String newString = oldStringG1.replaceAll(" ", "_");
description = description.replace(oldStringG1, newString);
m2 = p2.matcher(description);
}
}
Looking forward for your help...
Thanks YJS
If you are facing any programming issue, such as compilation errors or not able to find the code you are looking for.
Ask your questions, our development team will try to give answers to your questions.