[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Nick Cox" <n.j.cox@durham.ac.uk> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
RE: st: Collapsing with strings |

Date |
Mon, 16 Jan 2006 13:04:35 -0000 |

L'esprit de l'escalier: When observations come in blocks of 2 another approach is possible that removes the need for sorting on each variable, although one prior sort id is needed. foreach var of varlist var1-var3 { by id: replace `var' = `var'[3 - _n] if missing(`var') } When the missing values are in the first observation, _n is 1 and 3 - _n is 2. When the second, _n is 2 and 3 - _n is 1. Hence 3 - _n indexes the other observation in blocks of 2. This works regardless of variable type. Nick n.j.cox@durham.ac.uk Nick Cox > You're correct. I misread the question. Sorry. Gary Longton > > Daphna Bassok wrote: > > > > > I have several duplicate observations in my data set. > > However, they are > > > not perfect duplicates. Only the id # is the same. So > > there might be > > > two observations with id#16 for instance, the first will > > have values for > > > some variables, and missing values for others. The second > > also have some > > > values filled and some missing. There are no cases in > > which both have > > > values- that is... either the first in the pair has the > > value OR the > > > second has a value (or neither). > > > > > > For example: suppose I have two observations with id# 16... > > The first > > > has values for var1 and 2 and not 3. The second ONLY has > > values for > > > var 3. What i would like to do is simply collapse these > > into a single > > > observation with all the relevant info. meaning, 1 > observation with > > > id#16 that has values for all three variables. > > > > > > I am trying to do this with the collapse command with no success. > > > > > > My code is: > > > > > > collapse (min) var1-var3, by(id) > > > > > > I thought this would create a new observation that has all > > the data in it. > > > > > > I am getting a "type mismatch" error. > > > > > > Is this because some of my variables are string variables? > > > > Nick Cox suggested: > > > > > What you can do is -- if your description is correct -- > > > > > > egen nmiss = rowmiss(<insert variable names>) > > > bysort id (nmiss) : keep if _n == 1 > > > > > > as the sort will sort the observation with > > > more missings to second place. > > > > and Austin Nichols suggested: > > > > > foreach v of varlist put all the relevant varnames here { > > > bys id (`v'): qui replace `v'=`v'[_n-1] if mi(`v') > > > } > > > bys id: drop if _n>1 > > > > It is a rare day when one can make a correction to a > > typically accurate and > > elegant Nick Cox solution, so I make this one fearing that > > I've probably missed > > somthing obvious. > > > > If I understand the problem correctly, I think this solution > > will discard > > non-missing data for some variables. > > > > Eg. a simplified dataset like this seems consistent with > > Daphna's description: > > > > obs id var1 var2 var3 nmiss > > > > 1 16 a . 3 1 > > 2 16 . 2 . 2 > > 3 17 . . 7 2 > > 4 17 c 3 . 1 > > > > sorting on nmiss will discard observations 2 & 3, throwing > > away non-missing data > > for var2 > > > > As Austin's solution suggests, one needs to sort separately > > for each variable in > > the list and carry out the replace for that variable. > > However missings sort > > differently for string and numeric variables, taking first > > place for strings and > > last for numerics, so need to be handled differently in a > > sort solution. > > Austin's solution won't sort correctly for string variables. > > > > There is probably a shorter approach, but I think this will do it: > > > > ******** > > foreach var of varlist var1-var3 { > > if substr("`:type `var''",1,3) == "str" { > > bysort id (`var') replace `var' = `var'[_N] > > } > > else { > > bysort id (`var') replace `var' = `var'[1] > > } > > } > > bysort id: drop if _n>1 > > ************ > > * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: Collapsing with strings***From:*Gary Longton <glongton@fhcrc.org>

- Prev by Date:
**st: RE: Log Transformation and GLM** - Next by Date:
**st: Kinds of questions** - Previous by thread:
**RE: st: Collapsing with strings** - Next by thread:
**Re: st: Collapsing with strings** - Index(es):

© Copyright 1996–2021 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |